[Feat] A novel static EPLB placement strategy for MoE models. #23745
+445
−10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
This PR introduces a novel static expert load balancing placement strategy (called Zigzag) designed for MoE models with multiple expert groups, such as the DeepSeek series.
Through our heatmap analysis, we observed that in multi-expert-group MoE models such as DeepSeek, experts within the same group tend to be selected together in practical scenarios. Therefore, distributing them across different devices can bring performance benefits.
The zigzag expert placement feature has been validated on DeepSeek-R1, demonstrating ~8% improvement in QPM (Queries Per Minute) compared to the default configuration during our online serving benchmarking on a single node with h20*8.
The zigzag strategy optimizes how experts are distributed across parallel ranks by implementing a staggered placement pattern, which helps achieve better load balancing across expert parallel groups. This is particularly beneficial for models that use grouped top-k routing, where experts are organized into logical groups and the routing decisions are made within these groups. The implementation ensures that experts are distributed more evenly across ranks, reducing load imbalance and improving overall throughput performance in production environments.
Performance
Tested on h20*8 with input_len=3500, output_len=1000, request_rate=32, max_concurrency=32, num_prompts=3200.

Accuracy Test
Usage
To try out Zigzag static EPLB strategy, enable it with the following options:
Compatibility
The zigzag pattern is designed for MoE models with multiple expert groups, such as the DeepSeek series. Note that this method cannot benefit from MoE models without expert groups.