[Fix] fix offline env use local mode path #22526

lengrongfu · 2025-08-08T15:32:10Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".[Bug]: HF_HUB_OFFLINE Parameter does not take effect #22492
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

When set 127.0.0.1 localhost www.modelscope.cn huggingface.co in /etc/hosts, before download model, then running success.

Test Plan

Test Result

(Optional) Documentation Update

github-actions · 2025-08-08T15:32:18Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces a fix for using local models in an offline environment by correctly handling model paths when HF_HUB_OFFLINE is set. The changes involve modifying EngineArgs to resolve the model path at initialization and updating configuration loading to respect the offline setting. The overall approach is sound. I've provided one comment to refactor a new function to improve maintainability by reducing code duplication.

vllm/engine/arg_utils.py

Isotr0py · 2025-08-08T16:14:25Z

Can you add a regression test for this in test_regression.py?

vllm/tests/test_regression.py

Lines 60 to 78 in e789cad

    
           def test_model_from_modelscope(monkeypatch: pytest.MonkeyPatch): 
        
               # model: https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary 
        
               with monkeypatch.context() as m: 
        
                   m.setenv("VLLM_USE_MODELSCOPE", "True") 
        
                   # Don't use HF_TOKEN for ModelScope repos, otherwise it will fail 
        
                   # with 400 Client Error: Bad Request. 
        
                   m.setenv("HF_TOKEN", "") 
        
                   llm = LLM(model="qwen/Qwen1.5-0.5B-Chat") 
        
                   prompts = [ 
        
                       "Hello, my name is", 
        
                       "The president of the United States is", 
        
                       "The capital of France is", 
        
                       "The future of AI is", 
        
                   ] 
        
                   sampling_params = SamplingParams(temperature=0.8, top_p=0.95) 
        
                   outputs = llm.generate(prompts, sampling_params) 
        
                   assert len(outputs) == 4

mergify · 2025-08-10T03:26:57Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @lengrongfu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Isotr0py · 2025-08-12T14:23:34Z

vllm/engine/arg_utils.py

+def get_model_path(model: Union[str, Path], revision: Optional[str] = None):
+    if os.path.exists(model):
+        return model
+
+    common_kwargs = {
+        "local_files_only": huggingface_hub.constants.HF_HUB_OFFLINE,
+        "revision": revision,
+    }
+
+    if envs.VLLM_USE_MODELSCOPE:
+        from modelscope.hub.snapshot_download import snapshot_download
+        return snapshot_download(model_id=model, **common_kwargs)
+
+    from huggingface_hub import snapshot_download
+    return snapshot_download(repo_id=model, **common_kwargs)


Hmmm, I don't think we should download the whole model repo here... Model's weights should be downloaded by model loader instead.

vllm/vllm/model_executor/model_loader/weight_utils.py

Lines 259 to 314 in 67c153b

def download_weights_from_hf(

model_name_or_path: str,

cache_dir: Optional[str],

allow_patterns: list[str],

revision: Optional[str] = None,

ignore_patterns: Optional[Union[str, list[str]]] = None,

) -> str:

"""Download model weights from Hugging Face Hub.

Args:

model_name_or_path (str): The model name or path.

cache_dir (Optional[str]): The cache directory to store the model

weights. If None, will use HF defaults.

allow_patterns (list[str]): The allowed patterns for the

weight files. Files matched by any of the patterns will be

downloaded.

revision (Optional[str]): The revision of the model.

ignore_patterns (Optional[Union[str, list[str]]]): The patterns to

filter out the weight files. Files matched by any of the patterns

will be ignored.

Returns:

str: The path to the downloaded model weights.

"""

local_only = huggingface_hub.constants.HF_HUB_OFFLINE

if not local_only:

# Before we download we look at that is available:

fs = HfFileSystem()

file_list = fs.ls(model_name_or_path, detail=False, revision=revision)

# depending on what is available we download different things

for pattern in allow_patterns:

matching = fnmatch.filter(file_list, pattern)

if len(matching) > 0:

allow_patterns = [pattern]

break

logger.info("Using model weights format %s", allow_patterns)

# Use file lock to prevent multiple processes from

# downloading the same model weights at the same time.

with get_lock(model_name_or_path, cache_dir):

start_time = time.perf_counter()

hf_folder = snapshot_download(

model_name_or_path,

allow_patterns=allow_patterns,

ignore_patterns=ignore_patterns,

cache_dir=cache_dir,

tqdm_class=DisabledTqdm,

revision=revision,

local_files_only=local_only,

)

time_taken = time.perf_counter() - start_time

if time_taken > 0.5:

logger.info("Time spent downloading weights for %s: %.6f seconds",

model_name_or_path, time_taken)

return hf_folder

This only return local mode_path, because is offline, so it can't download model weight.

But if we don't use it under offline mode (likely used it by mistake), this function can still download from internet. I prefer to make this function more robust to return the cached model location directly, using something like try_to_load_from_cache (https://huggingface.co/docs/huggingface_hub/en/package_reference/cache#huggingface_hub.try_to_load_from_cache).

When set local_files_only is True after, it only read only local file, can't from remote repo. this snapshot_download method features.
https://github.com/huggingface/huggingface_hub/blob/b698915d6b582c72806ac3e91c43bfd8dde35856/src/huggingface_hub/_snapshot_download.py#L228-L234

Isotr0py · 2025-08-12T14:28:39Z

tests/test_regression.py

+def test_model_from_offline(monkeypatch: pytest.MonkeyPatch):
+    # model: https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary
+    with monkeypatch.context() as m:
+        m.setenv("VLLM_USE_MODELSCOPE", "True")
+        m.setenv("HF_HUB_OFFLINE", "True")
+        # Don't use HF_TOKEN for ModelScope repos, otherwise it will fail
+        # with 400 Client Error: Bad Request.
+        m.setenv("HF_TOKEN", "")
+        llm = LLM(model="Qwen/Qwen3-1.7B")
+
+        prompts = [
+            "Hello, my name is",
+            "The president of the United States is",
+            "The capital of France is",
+            "The future of AI is",
+        ]
+        sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
+
+        outputs = llm.generate(prompts, sampling_params)
+        assert len(outputs) == 4


The current code directly accesses huggingface, ignoring the HF_HUB_OFFLINE configuration.

If I understand correctly, we can simply check if code still try to create connection to access huggingface in this test?

How to check if huggingface connection is created, could you tell me？ thanks

Let's move this regression test to tests/entrypoints/offline_mode/test_offline_mode.py, we will remove test_regression.py soon (#22874). You can refer to its implementation to avoid network connection:

vllm/tests/entrypoints/offline_mode/test_offline_mode.py

Lines 60 to 87 in a353bd0

try:

m.setenv("HF_HUB_OFFLINE", "1")

m.setenv("VLLM_NO_USAGE_STATS", "1")

def disable_connect(*args, **kwargs):

raise RuntimeError("No http calls allowed")

m.setattr(

urllib3.connection.HTTPConnection,

"connect",

disable_connect,

)

m.setattr(

urllib3.connection.HTTPSConnection,

"connect",

disable_connect,

)

# Need to re-import huggingface_hub

# and friends to setup offline mode

_re_import_modules()

# Cached model files should be used in offline mode

for model_config in MODEL_CONFIGS:

LLM(**model_config)

finally:

# Reset the environment after the test

# NB: Assuming tests are run in online mode

_re_import_modules()

BTW, I wonder why existing offline mode tests suite can't catch this issue as well?

According to my test, there is some logic that uses the model file in EngineArgs, such as

vllm/vllm/engine/arg_utils.py

Lines 988 to 991 in 00e3f9d

if self.speculative_config is None:

hf_config = get_config(self.hf_config_path or self.model,

self.trust_remote_code, self.revision,

self.code_revision, self.config_format)

Isotr0py · 2025-08-14T16:18:15Z

vllm/engine/arg_utils.py

+def get_model_path(model: Union[str, Path], revision: Optional[str] = None):
+    if os.path.exists(model):
+        return model
+
+    common_kwargs = {
+        "local_files_only": huggingface_hub.constants.HF_HUB_OFFLINE,
+        "revision": revision,
+    }
+
+    if envs.VLLM_USE_MODELSCOPE:
+        from modelscope.hub.snapshot_download import snapshot_download
+        return snapshot_download(model_id=model, **common_kwargs)
+
+    from huggingface_hub import snapshot_download
+    return snapshot_download(repo_id=model, **common_kwargs)


But if we don't use it under offline mode (likely used it by mistake), this function can still download from internet. I prefer to make this function more robust to return the cached model location directly, using something like try_to_load_from_cache (https://huggingface.co/docs/huggingface_hub/en/package_reference/cache#huggingface_hub.try_to_load_from_cache).

tests/entrypoints/offline_mode/test_offline_mode.py

vllm/engine/arg_utils.py

Signed-off-by: rongfu.leng <[email protected]>

lengrongfu · 2025-08-20T11:00:44Z

Fix ci error, when replace model_id to model_path after.

Signed-off-by: rongfu.leng <[email protected]>

Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Duncan Moss <[email protected]>

Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>

Signed-off-by: rongfu.leng <[email protected]>

Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

Signed-off-by: rongfu.leng <[email protected]>

gemini-code-assist bot reviewed Aug 8, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

lengrongfu force-pushed the feat/add-hf-offline branch from 4448b32 to 7d8d36d Compare August 8, 2025 15:46

mergify bot added the needs-rebase label Aug 10, 2025

DarkLight1337 requested review from jeejeelee and Isotr0py August 10, 2025 03:27

lengrongfu force-pushed the feat/add-hf-offline branch from 7d8d36d to 1a32716 Compare August 12, 2025 06:03

mergify bot removed the needs-rebase label Aug 12, 2025

Isotr0py reviewed Aug 12, 2025

View reviewed changes

lengrongfu force-pushed the feat/add-hf-offline branch from 1a32716 to 2115c6d Compare August 14, 2025 08:32

lengrongfu requested review from DarkLight1337, robertgshaw2-redhat, simon-mo and aarnphm as code owners August 14, 2025 08:32

lengrongfu force-pushed the feat/add-hf-offline branch from 2115c6d to 5f24f33 Compare August 14, 2025 08:36

Isotr0py reviewed Aug 14, 2025

View reviewed changes

lengrongfu force-pushed the feat/add-hf-offline branch from 5f24f33 to 4f88322 Compare August 15, 2025 03:33

Isotr0py approved these changes Aug 15, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

lengrongfu force-pushed the feat/add-hf-offline branch from 4f88322 to ef92311 Compare August 15, 2025 09:40

Isotr0py enabled auto-merge (squash) August 18, 2025 08:08

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 18, 2025

[Fix] fix offline env use local mode path

abfffee

Signed-off-by: rongfu.leng <[email protected]>

auto-merge was automatically disabled August 20, 2025 10:59
Head branch was pushed to by a user without write access

lengrongfu force-pushed the feat/add-hf-offline branch from a297190 to abfffee Compare August 20, 2025 10:59

Merge branch 'main' into feat/add-hf-offline

0f330f1

Isotr0py enabled auto-merge (squash) August 20, 2025 11:05

Isotr0py merged commit 3821787 into vllm-project:main Aug 20, 2025
40 checks passed

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

11b0bcd

Signed-off-by: rongfu.leng <[email protected]>

cyang49 pushed a commit to cyang49/vllm that referenced this pull request Aug 20, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

3513a11

Signed-off-by: rongfu.leng <[email protected]>

MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Aug 20, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

fe37b01

Signed-off-by: rongfu.leng <[email protected]>

djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

29f58a0

Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Duncan Moss <[email protected]>

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Aug 21, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

6949325

Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 22, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

046813e

Signed-off-by: rongfu.leng <[email protected]>

likith mentioned this pull request Aug 26, 2025

[Bug]: Model loading from local path is broken when HF_HUB_OFFLINE is set to 1 #23684

Closed

1 task

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

4a6f97c

Signed-off-by: rongfu.leng <[email protected]>

juuice-lee pushed a commit to juuice-lee/vllm-moe.code that referenced this pull request Aug 28, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

5baf45d

Signed-off-by: rongfu.leng <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

41ac9b2

Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

9ec000e

Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

a245e9d

Signed-off-by: rongfu.leng <[email protected]>

dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

a6a4569

Signed-off-by: rongfu.leng <[email protected]>

2015aroras pushed a commit to 2015aroras/vllm that referenced this pull request Aug 29, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

f897d91

Signed-off-by: rongfu.leng <[email protected]>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

8b19c37

Signed-off-by: rongfu.leng <[email protected]>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

0145354

Signed-off-by: rongfu.leng <[email protected]>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

350fd77

Signed-off-by: rongfu.leng <[email protected]>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[Fix] fix offline env use local mode path (vllm-project#22526)

c88754f

Signed-off-by: rongfu.leng <[email protected]>

	def download_weights_from_hf(
	model_name_or_path: str,
	cache_dir: Optional[str],
	allow_patterns: list[str],
	revision: Optional[str] = None,
	ignore_patterns: Optional[Union[str, list[str]]] = None,
	) -> str:
	"""Download model weights from Hugging Face Hub.

	Args:
	model_name_or_path (str): The model name or path.
	cache_dir (Optional[str]): The cache directory to store the model
	weights. If None, will use HF defaults.
	allow_patterns (list[str]): The allowed patterns for the
	weight files. Files matched by any of the patterns will be
	downloaded.
	revision (Optional[str]): The revision of the model.
	ignore_patterns (Optional[Union[str, list[str]]]): The patterns to
	filter out the weight files. Files matched by any of the patterns
	will be ignored.

	Returns:
	str: The path to the downloaded model weights.
	"""
	local_only = huggingface_hub.constants.HF_HUB_OFFLINE
	if not local_only:
	# Before we download we look at that is available:
	fs = HfFileSystem()
	file_list = fs.ls(model_name_or_path, detail=False, revision=revision)

	# depending on what is available we download different things
	for pattern in allow_patterns:
	matching = fnmatch.filter(file_list, pattern)
	if len(matching) > 0:
	allow_patterns = [pattern]
	break

	logger.info("Using model weights format %s", allow_patterns)
	# Use file lock to prevent multiple processes from
	# downloading the same model weights at the same time.
	with get_lock(model_name_or_path, cache_dir):
	start_time = time.perf_counter()
	hf_folder = snapshot_download(
	model_name_or_path,
	allow_patterns=allow_patterns,
	ignore_patterns=ignore_patterns,
	cache_dir=cache_dir,
	tqdm_class=DisabledTqdm,
	revision=revision,
	local_files_only=local_only,
	)
	time_taken = time.perf_counter() - start_time
	if time_taken > 0.5:
	logger.info("Time spent downloading weights for %s: %.6f seconds",
	model_name_or_path, time_taken)
	return hf_folder

	try:
	m.setenv("HF_HUB_OFFLINE", "1")
	m.setenv("VLLM_NO_USAGE_STATS", "1")

	def disable_connect(args, *kwargs):
	raise RuntimeError("No http calls allowed")

	m.setattr(
	urllib3.connection.HTTPConnection,
	"connect",
	disable_connect,
	)
	m.setattr(
	urllib3.connection.HTTPSConnection,
	"connect",
	disable_connect,
	)

	# Need to re-import huggingface_hub
	# and friends to setup offline mode
	_re_import_modules()
	# Cached model files should be used in offline mode
	for model_config in MODEL_CONFIGS:
	LLM(**model_config)
	finally:
	# Reset the environment after the test
	# NB: Assuming tests are run in online mode
	_re_import_modules()

	if self.speculative_config is None:
	hf_config = get_config(self.hf_config_path or self.model,
	self.trust_remote_code, self.revision,
	self.code_revision, self.config_format)

Uh oh!

[Fix] fix offline env use local mode path #22526

[Fix] fix offline env use local mode path #22526

Uh oh!

Conversation

lengrongfu commented Aug 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Isotr0py commented Aug 8, 2025

Uh oh!

mergify bot commented Aug 10, 2025

Uh oh!

Isotr0py Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

lengrongfu Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Isotr0py Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lengrongfu Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

lengrongfu Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

lengrongfu Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lengrongfu commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

lengrongfu commented Aug 8, 2025 •

edited by github-actions bot

Loading

lengrongfu Aug 14, 2025 •

edited

Loading

Isotr0py Aug 14, 2025 •

edited

Loading

Isotr0py Aug 14, 2025 •

edited

Loading