Skip to content

[Bug]: Model loading from local path is broken when HF_HUB_OFFLINE is set to 1 #23684

@likith

Description

@likith

Your current environment

The output of python collect_env.py
==============================
        System Info
==============================
OS                           : CentOS Stream 9 (x86_64)
GCC version                  : (GCC) 11.5.0 20240719 (Red Hat 11.5.0-9)
Clang version                : Could not collect
CMake version                : Could not collect
Libc version                 : glibc-2.34

==============================
       PyTorch Info
==============================
PyTorch version              : 2.7.1+cpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.11

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.1.2
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cufile-cu12==1.11.1.6
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] pyzmq==27.0.2
[pip3] torch==2.7.1+cpu
[pip3] torchaudio==2.7.1+cpu
[pip3] torchvision==0.22.1+cpu
[pip3] transformers==4.55.4
[pip3] triton==3.3.1
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
Neuron SDK Version           : N/A
vLLM Version                 : 0.10.1.1
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

🐛 Describe the bug

For offline inference, model loading from local path is broken when HF_HUB_OFFLINE is set to 1.

I am trying to load a model form local path for offline inferencing using code similar to below. /local/path/to/model is the path to a model that contains HuggingFace model snapshot.

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="/local/path/to/model")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

When I try to run this code after setting HF_HUB_OFFLINE=1, LocalEntryNotFoundError exception is raised with this error message:

"Cannot find an appropriate cached snapshot folder for the specified revision on the local disk and outgoing traffic has been disabled. To enable repo look-ups and downloads online, pass 'local_files_only=False' as input."

This workflow used to work without any issues until recently before this PR was merged: #22526.

Upon more debugging, we found that the #22526 and #21680 might be introducing conflicting changes. With #21680, vLLM is trying to log non default arguments, which works fine on its own. But together with #22526, it introduces issue during logging of non default args:

  1. After specifying /local/path/to/model, the look up of the local model is fine.
  2. Once the local model look up is done, vLLM tries to log non default args.
  3. During logging of non default args, vLLM tries to create a default instance of EngineArgs() when non-default args are present: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/utils.py#L316
  4. In this default instance, model is set to "Qwen/Qwen3-0.6B": https://github.com/vllm-project/vllm/blob/main/vllm/config/__init__.py#L275
  5. With [Fix] fix offline env use local mode path #22526, during post_init() of EngineArgs() vLLM tries to load "Qwen/Qwen3-0.6B" from disk here: https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L463
  6. When vLLM tries to load the model, it complains saying "Qwen/Qwen3-0.6B" is not found in local path. I do not have "Qwen/Qwen3-0.6B" in my local model cache, nor do I need it since I'm loading an entirely different model for inferencing.

During logging of non-default args, we should avoid loading model, as this should not prevent from logging non-default args. This issue is affecting our offline workflows with HF_HUB_OFFLINE=1. Please help take a look at this issue.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions