Adapting Qwen3-32B to Eagle3 mode to resolve head dimension mismatch issues #23740

coder-fny · 2025-08-27T11:22:38Z

Purpose

The adapted weights from https://www.modelscope.cn/models/AngelSlim/Qwen3-32B_eagle3/ can perform Eagle3 inference, but the head dimension (80) of these weights is inconsistent with that of the target model (128), causing an error; therefore, adaptation is required.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces changes to handle KV cache configurations for models with variable head dimensions, specifically for Qwen3-32B in Eagle3 mode. It adds a new path for creating KV cache configurations when all layers use FullAttentionSpec but may have different head_size values. While the memory allocation logic correctly uses the maximum page size, the logic for creating the KVCacheGroupSpec is flawed. It uses the spec of the first layer to represent the entire group, which can lead to incorrect concurrency calculations and potential runtime errors if other layers have larger memory requirements. My review includes a high-severity fix for this issue to ensure the group's specification reflects the most demanding layer, preventing potential out-of-memory errors.

gemini-code-assist · 2025-08-27T11:24:44Z

vllm/v1/core/kv_cache_utils.py

+        layer_specs = [
+            kv_cache_spec[layer_name] for layer_name in layer_names_one_group
+        ]
+        merged_layer_spec = layer_specs[0]


Using the spec of the first layer (layer_specs[0]) to represent the entire group can be problematic when layers have different head_size values, as indicated by the PR's purpose. This leads to an incorrect KVCacheGroupSpec for the group.

For instance, get_max_concurrency_for_kv_cache_config uses group.kv_cache_spec.page_size_bytes for its calculation. If the first layer has a smaller head_size than other layers in the group, its page_size_bytes will be smaller than the maximum page size used for allocation in _get_kv_cache_config_allFullAttentionSpec_type. This will cause max_concurrency to be overestimated, which could lead to out-of-memory errors at runtime.

To ensure correctness, the merged_layer_spec should represent the most demanding configuration within the group, which corresponds to the layer with the largest page_size_bytes.

Suggested change

merged_layer_spec = layer_specs[0]

merged_layer_spec = max(layer_specs, key=lambda spec: spec.page_size_bytes)

github-actions · 2025-08-27T11:26:47Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

… mismatch issues Signed-off-by: funanyang <[email protected]>

coder-fny requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners August 27, 2025 11:22

mergify bot added qwen Related to Qwen models v1 labels Aug 27, 2025

coder-fny changed the title ~~Adapting Qwen3-32B to Eagle3 mode to resolve head dimension (headdim)…~~ Adapting Qwen3-32B to Eagle3 mode to resolve head dimension (headdim）different Aug 27, 2025

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

coder-fny mentioned this pull request Aug 27, 2025

[feat] Adapting Qwen3-32B to Eagle3 mode to resolve head dimension (headdim) mismatch issues vllm-project/vllm-ascend#2578

Open

coder-fny changed the title ~~Adapting Qwen3-32B to Eagle3 mode to resolve head dimension (headdim）different~~ Adapting Qwen3-32B to Eagle3 mode to resolve head dimension mismatch issues Aug 27, 2025

Adapting Qwen3-32B to Eagle3 mode to resolve head dimension (headdim)…

498becf

… mismatch issues Signed-off-by: funanyang <[email protected]>

coder-fny force-pushed the qwen3_dense_eagle3 branch from bde497b to 498becf Compare August 27, 2025 11:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adapting Qwen3-32B to Eagle3 mode to resolve head dimension mismatch issues #23740

Adapting Qwen3-32B to Eagle3 mode to resolve head dimension mismatch issues #23740

coder-fny commented Aug 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 27, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

Uh oh!

	merged_layer_spec = layer_specs[0]
	merged_layer_spec = max(layer_specs, key=lambda spec: spec.page_size_bytes)

Uh oh!

Adapting Qwen3-32B to Eagle3 mode to resolve head dimension mismatch issues #23740

Are you sure you want to change the base?

Adapting Qwen3-32B to Eagle3 mode to resolve head dimension mismatch issues #23740

Conversation

coder-fny commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

Uh oh!

coder-fny commented Aug 27, 2025 •

edited by github-actions bot

Loading