Skip to content

Conversation

lengrongfu
Copy link
Contributor

@lengrongfu lengrongfu commented Aug 8, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".[Bug]: HF_HUB_OFFLINE Parameter does not take effect #22492
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

When set 127.0.0.1 localhost www.modelscope.cn huggingface.co in /etc/hosts, before download model, then running success.

Test Plan

Test Result

(Optional) Documentation Update

Copy link

github-actions bot commented Aug 8, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix for using local models in an offline environment by correctly handling model paths when HF_HUB_OFFLINE is set. The changes involve modifying EngineArgs to resolve the model path at initialization and updating configuration loading to respect the offline setting. The overall approach is sound. I've provided one comment to refactor a new function to improve maintainability by reducing code duplication.

@lengrongfu lengrongfu force-pushed the feat/add-hf-offline branch from 4448b32 to 7d8d36d Compare August 8, 2025 15:46
@Isotr0py
Copy link
Member

Isotr0py commented Aug 8, 2025

Can you add a regression test for this in test_regression.py?

def test_model_from_modelscope(monkeypatch: pytest.MonkeyPatch):
# model: https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary
with monkeypatch.context() as m:
m.setenv("VLLM_USE_MODELSCOPE", "True")
# Don't use HF_TOKEN for ModelScope repos, otherwise it will fail
# with 400 Client Error: Bad Request.
m.setenv("HF_TOKEN", "")
llm = LLM(model="qwen/Qwen1.5-0.5B-Chat")
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
outputs = llm.generate(prompts, sampling_params)
assert len(outputs) == 4

Copy link

mergify bot commented Aug 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @lengrongfu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Comment on lines 1848 to 1833
def get_model_path(model: Union[str, Path], revision: Optional[str] = None):
if os.path.exists(model):
return model

common_kwargs = {
"local_files_only": huggingface_hub.constants.HF_HUB_OFFLINE,
"revision": revision,
}

if envs.VLLM_USE_MODELSCOPE:
from modelscope.hub.snapshot_download import snapshot_download
return snapshot_download(model_id=model, **common_kwargs)

from huggingface_hub import snapshot_download
return snapshot_download(repo_id=model, **common_kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I don't think we should download the whole model repo here... Model's weights should be downloaded by model loader instead.

def download_weights_from_hf(
model_name_or_path: str,
cache_dir: Optional[str],
allow_patterns: list[str],
revision: Optional[str] = None,
ignore_patterns: Optional[Union[str, list[str]]] = None,
) -> str:
"""Download model weights from Hugging Face Hub.
Args:
model_name_or_path (str): The model name or path.
cache_dir (Optional[str]): The cache directory to store the model
weights. If None, will use HF defaults.
allow_patterns (list[str]): The allowed patterns for the
weight files. Files matched by any of the patterns will be
downloaded.
revision (Optional[str]): The revision of the model.
ignore_patterns (Optional[Union[str, list[str]]]): The patterns to
filter out the weight files. Files matched by any of the patterns
will be ignored.
Returns:
str: The path to the downloaded model weights.
"""
local_only = huggingface_hub.constants.HF_HUB_OFFLINE
if not local_only:
# Before we download we look at that is available:
fs = HfFileSystem()
file_list = fs.ls(model_name_or_path, detail=False, revision=revision)
# depending on what is available we download different things
for pattern in allow_patterns:
matching = fnmatch.filter(file_list, pattern)
if len(matching) > 0:
allow_patterns = [pattern]
break
logger.info("Using model weights format %s", allow_patterns)
# Use file lock to prevent multiple processes from
# downloading the same model weights at the same time.
with get_lock(model_name_or_path, cache_dir):
start_time = time.perf_counter()
hf_folder = snapshot_download(
model_name_or_path,
allow_patterns=allow_patterns,
ignore_patterns=ignore_patterns,
cache_dir=cache_dir,
tqdm_class=DisabledTqdm,
revision=revision,
local_files_only=local_only,
)
time_taken = time.perf_counter() - start_time
if time_taken > 0.5:
logger.info("Time spent downloading weights for %s: %.6f seconds",
model_name_or_path, time_taken)
return hf_folder

Copy link
Contributor Author

@lengrongfu lengrongfu Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only return local mode_path, because is offline, so it can't download model weight.

Copy link
Member

@Isotr0py Isotr0py Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we don't use it under offline mode (likely used it by mistake), this function can still download from internet. I prefer to make this function more robust to return the cached model location directly, using something like try_to_load_from_cache (https://huggingface.co/docs/huggingface_hub/en/package_reference/cache#huggingface_hub.try_to_load_from_cache).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When set local_files_only is True after, it only read only local file, can't from remote repo. this snapshot_download method features.
https://github.com/huggingface/huggingface_hub/blob/b698915d6b582c72806ac3e91c43bfd8dde35856/src/huggingface_hub/_snapshot_download.py#L228-L234

Comment on lines 81 to 100
def test_model_from_offline(monkeypatch: pytest.MonkeyPatch):
# model: https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary
with monkeypatch.context() as m:
m.setenv("VLLM_USE_MODELSCOPE", "True")
m.setenv("HF_HUB_OFFLINE", "True")
# Don't use HF_TOKEN for ModelScope repos, otherwise it will fail
# with 400 Client Error: Bad Request.
m.setenv("HF_TOKEN", "")
llm = LLM(model="Qwen/Qwen3-1.7B")

prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

outputs = llm.generate(prompts, sampling_params)
assert len(outputs) == 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code directly accesses huggingface, ignoring the HF_HUB_OFFLINE configuration.

If I understand correctly, we can simply check if code still try to create connection to access huggingface in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to check if huggingface connection is created, could you tell me? thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this regression test to tests/entrypoints/offline_mode/test_offline_mode.py, we will remove test_regression.py soon (#22874). You can refer to its implementation to avoid network connection:

try:
m.setenv("HF_HUB_OFFLINE", "1")
m.setenv("VLLM_NO_USAGE_STATS", "1")
def disable_connect(*args, **kwargs):
raise RuntimeError("No http calls allowed")
m.setattr(
urllib3.connection.HTTPConnection,
"connect",
disable_connect,
)
m.setattr(
urllib3.connection.HTTPSConnection,
"connect",
disable_connect,
)
# Need to re-import huggingface_hub
# and friends to setup offline mode
_re_import_modules()
# Cached model files should be used in offline mode
for model_config in MODEL_CONFIGS:
LLM(**model_config)
finally:
# Reset the environment after the test
# NB: Assuming tests are run in online mode
_re_import_modules()

BTW, I wonder why existing offline mode tests suite can't catch this issue as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my test, there is some logic that uses the model file in EngineArgs, such as

if self.speculative_config is None:
hf_config = get_config(self.hf_config_path or self.model,
self.trust_remote_code, self.revision,
self.code_revision, self.config_format)

Comment on lines 1848 to 1833
def get_model_path(model: Union[str, Path], revision: Optional[str] = None):
if os.path.exists(model):
return model

common_kwargs = {
"local_files_only": huggingface_hub.constants.HF_HUB_OFFLINE,
"revision": revision,
}

if envs.VLLM_USE_MODELSCOPE:
from modelscope.hub.snapshot_download import snapshot_download
return snapshot_download(model_id=model, **common_kwargs)

from huggingface_hub import snapshot_download
return snapshot_download(repo_id=model, **common_kwargs)
Copy link
Member

@Isotr0py Isotr0py Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we don't use it under offline mode (likely used it by mistake), this function can still download from internet. I prefer to make this function more robust to return the cached model location directly, using something like try_to_load_from_cache (https://huggingface.co/docs/huggingface_hub/en/package_reference/cache#huggingface_hub.try_to_load_from_cache).

@lengrongfu lengrongfu force-pushed the feat/add-hf-offline branch from 5f24f33 to 4f88322 Compare August 15, 2025 03:33
@lengrongfu lengrongfu force-pushed the feat/add-hf-offline branch from 4f88322 to ef92311 Compare August 15, 2025 09:40
@Isotr0py Isotr0py enabled auto-merge (squash) August 18, 2025 08:08
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 18, 2025
auto-merge was automatically disabled August 20, 2025 10:59

Head branch was pushed to by a user without write access

@lengrongfu lengrongfu force-pushed the feat/add-hf-offline branch from a297190 to abfffee Compare August 20, 2025 10:59
@lengrongfu
Copy link
Contributor Author

Fix ci error, when replace model_id to model_path after.

@Isotr0py Isotr0py enabled auto-merge (squash) August 20, 2025 11:05
@Isotr0py Isotr0py merged commit 3821787 into vllm-project:main Aug 20, 2025
40 checks passed
divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025
cyang49 pushed a commit to cyang49/vllm that referenced this pull request Aug 20, 2025
MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Aug 20, 2025
djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Aug 21, 2025
kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 22, 2025
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
juuice-lee pushed a commit to juuice-lee/vllm-moe.code that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025
2015aroras pushed a commit to 2015aroras/vllm that referenced this pull request Aug 29, 2025
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants