[Feat] A novel static EPLB placement strategy for MoE models. #23745

cboss6 · 2025-08-27T13:08:12Z

Purpose

This PR introduces a novel static expert load balancing placement strategy (called Zigzag) designed for MoE models with multiple expert groups, such as the DeepSeek series.

Through our heatmap analysis, we observed that in multi-expert-group MoE models such as DeepSeek, experts within the same group tend to be selected together in practical scenarios. Therefore, distributing them across different devices can bring performance benefits.

The zigzag expert placement feature has been validated on DeepSeek-R1, demonstrating ~8% improvement in QPM (Queries Per Minute) compared to the default configuration during our online serving benchmarking on a single node with h20*8.

The zigzag strategy optimizes how experts are distributed across parallel ranks by implementing a staggered placement pattern, which helps achieve better load balancing across expert parallel groups. This is particularly beneficial for models that use grouped top-k routing, where experts are organized into logical groups and the routing decisions are made within these groups. The implementation ensures that experts are distributed more evenly across ranks, reducing load imbalance and improving overall throughput performance in production environments.

Performance

Tested on h20*8 with input_len=3500, output_len=1000, request_rate=32, max_concurrency=32, num_prompts=3200.

Accuracy Test

Dataset	Accuracy Baseline	Accuracy (zigzag-MR)
Aime24	79.80%	80.00%
Gpqa	71.50%	71.21%
Math500	97.30%	95.20%

Usage

To try out Zigzag static EPLB strategy, enable it with the following options:

from vllm import LLM, SamplingParams
model_path = '/model/path/to/DeepSeek/series'
model = LLM(model=model_path,
            enable_expert_parallel=True,
            enable_eplb=True,
            enable_zigzag_expert_placement=True,
)

Compatibility

The zigzag pattern is designed for MoE models with multiple expert groups, such as the DeepSeek series. Note that this method cannot benefit from MoE models without expert groups.

gemini-code-assist

Code Review

This pull request introduces a novel "Zigzag" static expert placement strategy for MoE models, which is a welcome performance optimization. The implementation is mostly sound, with the necessary configuration options and logic added. My review includes two main points of feedback. Firstly, there's some redundant code in the new zigzag placement logic that can be removed to improve clarity and correctness. Secondly, the assertion for validating the zigzag placement configuration could be improved by splitting it into multiple assertions with more specific error messages, which would enhance the developer experience when debugging configuration issues.

vllm/model_executor/layers/fused_moe/layer.py

Co-authored-by: Minhao Qin <[email protected]>

DarkLight1337 · 2025-08-27T14:18:19Z

You can fix the pre-commit about .md files by merging from main

Co-authored-by: lemon412 <[email protected]>

cboss6 · 2025-08-28T02:51:50Z

You can fix the pre-commit about .md files by merging from main

Done. Could you please have a further review? Thanks

Co-authored-by: lemon412 <[email protected]>

cboss6 requested review from simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad, hmellor, yewentao256 and ProExpertProg as code owners August 27, 2025 13:08

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

Enable zigzag static EPLB strategy for expert placement in MoE layers.

a89dcd3

Co-authored-by: Minhao Qin <[email protected]>

cboss6 force-pushed the cboss/zigzag-vllm branch from f2add2d to a89dcd3 Compare August 27, 2025 13:27

cboss6 and others added 2 commits August 28, 2025 10:44

Format.

7ee87b7

Co-authored-by: lemon412 <[email protected]>

Merge branch 'main' into cboss/zigzag-vllm-co

c248d91

Co-authored-by: lemon412 <[email protected]>

Refactor assert statement.

6f07f90

Co-authored-by: lemon412 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feat] A novel static EPLB placement strategy for MoE models. #23745

[Feat] A novel static EPLB placement strategy for MoE models. #23745

cboss6 commented Aug 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 27, 2025

Uh oh!

cboss6 commented Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

[Feat] A novel static EPLB placement strategy for MoE models. #23745

Are you sure you want to change the base?

[Feat] A novel static EPLB placement strategy for MoE models. #23745

Conversation

cboss6 commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Performance

Accuracy Test

Usage

Compatibility

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 27, 2025

Uh oh!

cboss6 commented Aug 28, 2025

Uh oh!

Uh oh!

cboss6 commented Aug 27, 2025 •

edited by github-actions bot

Loading