-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
[Fix] fix offline env use local mode path #22526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a fix for using local models in an offline environment by correctly handling model paths when HF_HUB_OFFLINE
is set. The changes involve modifying EngineArgs
to resolve the model path at initialization and updating configuration loading to respect the offline setting. The overall approach is sound. I've provided one comment to refactor a new function to improve maintainability by reducing code duplication.
4448b32
to
7d8d36d
Compare
Can you add a regression test for this in Lines 60 to 78 in e789cad
|
This pull request has merge conflicts that must be resolved before it can be |
7d8d36d
to
1a32716
Compare
vllm/engine/arg_utils.py
Outdated
def get_model_path(model: Union[str, Path], revision: Optional[str] = None): | ||
if os.path.exists(model): | ||
return model | ||
|
||
common_kwargs = { | ||
"local_files_only": huggingface_hub.constants.HF_HUB_OFFLINE, | ||
"revision": revision, | ||
} | ||
|
||
if envs.VLLM_USE_MODELSCOPE: | ||
from modelscope.hub.snapshot_download import snapshot_download | ||
return snapshot_download(model_id=model, **common_kwargs) | ||
|
||
from huggingface_hub import snapshot_download | ||
return snapshot_download(repo_id=model, **common_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I don't think we should download the whole model repo here... Model's weights should be downloaded by model loader instead.
vllm/vllm/model_executor/model_loader/weight_utils.py
Lines 259 to 314 in 67c153b
def download_weights_from_hf( | |
model_name_or_path: str, | |
cache_dir: Optional[str], | |
allow_patterns: list[str], | |
revision: Optional[str] = None, | |
ignore_patterns: Optional[Union[str, list[str]]] = None, | |
) -> str: | |
"""Download model weights from Hugging Face Hub. | |
Args: | |
model_name_or_path (str): The model name or path. | |
cache_dir (Optional[str]): The cache directory to store the model | |
weights. If None, will use HF defaults. | |
allow_patterns (list[str]): The allowed patterns for the | |
weight files. Files matched by any of the patterns will be | |
downloaded. | |
revision (Optional[str]): The revision of the model. | |
ignore_patterns (Optional[Union[str, list[str]]]): The patterns to | |
filter out the weight files. Files matched by any of the patterns | |
will be ignored. | |
Returns: | |
str: The path to the downloaded model weights. | |
""" | |
local_only = huggingface_hub.constants.HF_HUB_OFFLINE | |
if not local_only: | |
# Before we download we look at that is available: | |
fs = HfFileSystem() | |
file_list = fs.ls(model_name_or_path, detail=False, revision=revision) | |
# depending on what is available we download different things | |
for pattern in allow_patterns: | |
matching = fnmatch.filter(file_list, pattern) | |
if len(matching) > 0: | |
allow_patterns = [pattern] | |
break | |
logger.info("Using model weights format %s", allow_patterns) | |
# Use file lock to prevent multiple processes from | |
# downloading the same model weights at the same time. | |
with get_lock(model_name_or_path, cache_dir): | |
start_time = time.perf_counter() | |
hf_folder = snapshot_download( | |
model_name_or_path, | |
allow_patterns=allow_patterns, | |
ignore_patterns=ignore_patterns, | |
cache_dir=cache_dir, | |
tqdm_class=DisabledTqdm, | |
revision=revision, | |
local_files_only=local_only, | |
) | |
time_taken = time.perf_counter() - start_time | |
if time_taken > 0.5: | |
logger.info("Time spent downloading weights for %s: %.6f seconds", | |
model_name_or_path, time_taken) | |
return hf_folder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only return local mode_path, because is offline, so it can't download model weight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we don't use it under offline mode (likely used it by mistake), this function can still download from internet. I prefer to make this function more robust to return the cached model location directly, using something like try_to_load_from_cache
(https://huggingface.co/docs/huggingface_hub/en/package_reference/cache#huggingface_hub.try_to_load_from_cache).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When set local_files_only
is True after, it only read only local file, can't from remote repo. this snapshot_download
method features.
https://github.com/huggingface/huggingface_hub/blob/b698915d6b582c72806ac3e91c43bfd8dde35856/src/huggingface_hub/_snapshot_download.py#L228-L234
tests/test_regression.py
Outdated
def test_model_from_offline(monkeypatch: pytest.MonkeyPatch): | ||
# model: https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary | ||
with monkeypatch.context() as m: | ||
m.setenv("VLLM_USE_MODELSCOPE", "True") | ||
m.setenv("HF_HUB_OFFLINE", "True") | ||
# Don't use HF_TOKEN for ModelScope repos, otherwise it will fail | ||
# with 400 Client Error: Bad Request. | ||
m.setenv("HF_TOKEN", "") | ||
llm = LLM(model="Qwen/Qwen3-1.7B") | ||
|
||
prompts = [ | ||
"Hello, my name is", | ||
"The president of the United States is", | ||
"The capital of France is", | ||
"The future of AI is", | ||
] | ||
sampling_params = SamplingParams(temperature=0.8, top_p=0.95) | ||
|
||
outputs = llm.generate(prompts, sampling_params) | ||
assert len(outputs) == 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code directly accesses huggingface, ignoring the HF_HUB_OFFLINE configuration.
If I understand correctly, we can simply check if code still try to create connection to access huggingface in this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to check if huggingface connection is created, could you tell me? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this regression test to tests/entrypoints/offline_mode/test_offline_mode.py
, we will remove test_regression.py
soon (#22874). You can refer to its implementation to avoid network connection:
vllm/tests/entrypoints/offline_mode/test_offline_mode.py
Lines 60 to 87 in a353bd0
try: | |
m.setenv("HF_HUB_OFFLINE", "1") | |
m.setenv("VLLM_NO_USAGE_STATS", "1") | |
def disable_connect(*args, **kwargs): | |
raise RuntimeError("No http calls allowed") | |
m.setattr( | |
urllib3.connection.HTTPConnection, | |
"connect", | |
disable_connect, | |
) | |
m.setattr( | |
urllib3.connection.HTTPSConnection, | |
"connect", | |
disable_connect, | |
) | |
# Need to re-import huggingface_hub | |
# and friends to setup offline mode | |
_re_import_modules() | |
# Cached model files should be used in offline mode | |
for model_config in MODEL_CONFIGS: | |
LLM(**model_config) | |
finally: | |
# Reset the environment after the test | |
# NB: Assuming tests are run in online mode | |
_re_import_modules() |
BTW, I wonder why existing offline mode tests suite can't catch this issue as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my test, there is some logic that uses the model file in EngineArgs, such as
Lines 988 to 991 in 00e3f9d
if self.speculative_config is None: | |
hf_config = get_config(self.hf_config_path or self.model, | |
self.trust_remote_code, self.revision, | |
self.code_revision, self.config_format) |
1a32716
to
2115c6d
Compare
2115c6d
to
5f24f33
Compare
vllm/engine/arg_utils.py
Outdated
def get_model_path(model: Union[str, Path], revision: Optional[str] = None): | ||
if os.path.exists(model): | ||
return model | ||
|
||
common_kwargs = { | ||
"local_files_only": huggingface_hub.constants.HF_HUB_OFFLINE, | ||
"revision": revision, | ||
} | ||
|
||
if envs.VLLM_USE_MODELSCOPE: | ||
from modelscope.hub.snapshot_download import snapshot_download | ||
return snapshot_download(model_id=model, **common_kwargs) | ||
|
||
from huggingface_hub import snapshot_download | ||
return snapshot_download(repo_id=model, **common_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we don't use it under offline mode (likely used it by mistake), this function can still download from internet. I prefer to make this function more robust to return the cached model location directly, using something like try_to_load_from_cache
(https://huggingface.co/docs/huggingface_hub/en/package_reference/cache#huggingface_hub.try_to_load_from_cache).
5f24f33
to
4f88322
Compare
4f88322
to
ef92311
Compare
Signed-off-by: rongfu.leng <[email protected]>
Head branch was pushed to by a user without write access
a297190
to
abfffee
Compare
Fix ci error, when replace model_id to model_path after. |
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
When set
127.0.0.1 localhost www.modelscope.cn huggingface.co
in/etc/hosts
, before download model, then running success.Test Plan
Test Result
(Optional) Documentation Update