Support for NemotronH Nano VLM with an optimized vision model (vLLM native) #23753

danielafrimi · 2025-08-27T14:28:16Z

This PR builds on #23644

In addition to supporting the new VL model, it introduces the vision encoder implementation (C-RADIO
) using vLLM’s native layer.

To reduce code duplication, the implementation leverages InternVisionModel blocks.

Signed-off-by: Daniel Afrimi <[email protected]>

gemini-code-assist

Code Review

This pull request adds support for the NemotronH Nano VLM model, including a native vLLM implementation of the C-RADIO vision encoder. The changes are well-structured, introducing new model files for nano_vlm and radio, along with corresponding tests. My review has identified a critical bug in the new test file that would prevent it from running correctly, a potential TypeError in the model initialization due to unsafe dictionary access, and the use of print statements for debugging which should be replaced with proper logging. Addressing these points will improve the robustness and maintainability of the new model support.

gemini-code-assist · 2025-08-27T14:30:45Z

tests/models/multimodal/pooling/test_radio.py

+    pixel_values = [
+        img_processor(
+            images,
+            return_tensors='pt').pixel_values.to(torch_dtype)[:, :, :, :640]
+        for images in images
+    ]


The list comprehension for pixel_values seems to have a bug. The img_processor is called with a single PIL image, which returns a 3D tensor for pixel_values. However, a 4D slice [:, :, :, :640] is then applied to this 3D tensor, which will cause a runtime error.

To fix this, you can process each image as a list containing a single image to ensure the pixel_values tensor is 4D. Also, using a more descriptive loop variable name would improve readability.

Suggested change

pixel_values = [

img_processor(

images,

return_tensors='pt').pixel_values.to(torch_dtype)[:, :, :, :640]

for images in images

]

pixel_values = [

img_processor(

[img],

return_tensors='pt').pixel_values.to(torch_dtype)[:, :, :, :640]

for img in images

]

gemini-code-assist · 2025-08-27T14:30:45Z

vllm/model_executor/models/nano_vlm.py

+        model_name = hf_config.args.get("model")
+        hidden_size, num_layers, num_heads, intermediate_size = vit_dims.get(
+            model_name)


The call to vit_dims.get(model_name) can return None if model_name is not found in the vit_dims dictionary. This would cause a TypeError when attempting to unpack the result into hidden_size, num_layers, num_heads, intermediate_size. It's safer to check if the model name exists in the dictionary before attempting to get the value.

Suggested change

model_name = hf_config.args.get("model")

hidden_size, num_layers, num_heads, intermediate_size = vit_dims.get(

model_name)

model_name = hf_config.args.get("model")

if model_name not in vit_dims:

raise ValueError(

f"Unsupported ViT model type for Radio: {model_name}. "

f"Supported types are: {list(vit_dims.keys())}")

hidden_size, num_layers, num_heads, intermediate_size = vit_dims[

model_name]

gemini-code-assist · 2025-08-27T14:30:45Z

vllm/model_executor/models/radio.py

+            print("in intervit cls token init num_tokens: ", num_tokens)
+            print("in intervit cls token init num_registers: ",
+                  self.num_registers)


There are print statements in the ClsToken initializer. In a library like vLLM, using print for logging can clutter the output and is generally discouraged. Please consider using the logger from vllm.logger for debugging information, or removing these statements if they are not necessary.

danielafrimi added 6 commits August 26, 2025 12:34

vlm works

13db161

Signed-off-by: Daniel Afrimi <[email protected]>

vision works

7573e9e

Signed-off-by: Daniel Afrimi <[email protected]>

optimized vision

3269a3a

Signed-off-by: Daniel Afrimi <[email protected]>

optimized vision

916be1a

Signed-off-by: Daniel Afrimi <[email protected]>

optimized vision

a7b808d

Signed-off-by: Daniel Afrimi <[email protected]>

remove draft files

6eb6490

Signed-off-by: Daniel Afrimi <[email protected]>

danielafrimi requested review from DarkLight1337, ywang96, simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad, hmellor, yewentao256 and ProExpertProg as code owners August 27, 2025 14:28

danielafrimi marked this pull request as draft August 27, 2025 14:28

mergify bot added multi-modality Related to multi-modality (#4194) new-model Requests to new models labels Aug 27, 2025

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support for NemotronH Nano VLM with an optimized vision model (vLLM native) #23753

Support for NemotronH Nano VLM with an optimized vision model (vLLM native) #23753

danielafrimi commented Aug 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 27, 2025

Uh oh!

gemini-code-assist bot Aug 27, 2025

Uh oh!

gemini-code-assist bot Aug 27, 2025

Uh oh!

Uh oh!

-        model_name = hf_config.args.get("model")
-        hidden_size, num_layers, num_heads, intermediate_size = vit_dims.get(
-            model_name)
+        model_name = hf_config.args.get("model")
+        if model_name not in vit_dims:
+            raise ValueError(
+                f"Unsupported ViT model type for Radio: {model_name}. "
+                f"Supported types are: {list(vit_dims.keys())}")
+        hidden_size, num_layers, num_heads, intermediate_size = vit_dims[
+            model_name]

Uh oh!

Support for NemotronH Nano VLM with an optimized vision model (vLLM native) #23753

Are you sure you want to change the base?

Support for NemotronH Nano VLM with an optimized vision model (vLLM native) #23753

Conversation

danielafrimi commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danielafrimi commented Aug 27, 2025 •

edited by github-actions bot

Loading