Skip to content

Conversation

cboss6
Copy link

@cboss6 cboss6 commented Aug 27, 2025

Purpose

This PR introduces a novel static expert load balancing placement strategy (called Zigzag) designed for MoE models with multiple expert groups, such as the DeepSeek series.

Clipboard_Screenshot_1756349711

Through our heatmap analysis, we observed that in multi-expert-group MoE models such as DeepSeek, experts within the same group tend to be selected together in practical scenarios. Therefore, distributing them across different devices can bring performance benefits.

The zigzag expert placement feature has been validated on DeepSeek-R1, demonstrating ~8% improvement in QPM (Queries Per Minute) compared to the default configuration during our online serving benchmarking on a single node with h20*8.

The zigzag strategy optimizes how experts are distributed across parallel ranks by implementing a staggered placement pattern, which helps achieve better load balancing across expert parallel groups. This is particularly beneficial for models that use grouped top-k routing, where experts are organized into logical groups and the routing decisions are made within these groups. The implementation ensures that experts are distributed more evenly across ranks, reducing load imbalance and improving overall throughput performance in production environments.

Performance

Tested on h20*8 with input_len=3500, output_len=1000, request_rate=32, max_concurrency=32, num_prompts=3200.
image

Accuracy Test

Dataset Accuracy Baseline Accuracy (zigzag-MR)
Aime24 79.80% 80.00%
Gpqa 71.50% 71.21%
Math500 97.30% 95.20%

Usage

To try out Zigzag static EPLB strategy, enable it with the following options:

from vllm import LLM, SamplingParams
model_path = '/model/path/to/DeepSeek/series'
model = LLM(model=model_path,
            enable_expert_parallel=True,
            enable_eplb=True,
            enable_zigzag_expert_placement=True,
)

Compatibility

The zigzag pattern is designed for MoE models with multiple expert groups, such as the DeepSeek series. Note that this method cannot benefit from MoE models without expert groups.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a novel "Zigzag" static expert placement strategy for MoE models, which is a welcome performance optimization. The implementation is mostly sound, with the necessary configuration options and logic added. My review includes two main points of feedback. Firstly, there's some redundant code in the new zigzag placement logic that can be removed to improve clarity and correctness. Secondly, the assertion for validating the zigzag placement configuration could be improved by splitting it into multiple assertions with more specific error messages, which would enhance the developer experience when debugging configuration issues.

@cboss6 cboss6 force-pushed the cboss/zigzag-vllm branch from f2add2d to a89dcd3 Compare August 27, 2025 13:27
@DarkLight1337
Copy link
Member

You can fix the pre-commit about .md files by merging from main

cboss6 and others added 2 commits August 28, 2025 10:44
@cboss6
Copy link
Author

cboss6 commented Aug 28, 2025

You can fix the pre-commit about .md files by merging from main

Done. Could you please have a further review? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants