Use Blackwell FlashInfer MXFP4 MoE by default if available #23008

mgoin · 2025-08-15T21:29:39Z

Purpose

Essentially we should act like VLLM_USE_FLASHINFER_MOE_MXFP4_BF16=1 if on SM100 and flashinfer is installed

Test Plan

Test Result

Main branch:

lm_eval --model vllm --model_args pretrained=openai/gpt-oss-20b --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto
(EngineCore_0 pid=4179585)   File "/home/mgoin/code/vllm/vllm/model_executor/layers/quantization/mxfp4.py", line 356, in process_weights_after_loading
(EngineCore_0 pid=4179585)     from triton_kernels.matmul_ogs import FlexCtx, PrecisionConfig
(EngineCore_0 pid=4179585) ModuleNotFoundError: No module named 'triton_kernels'

VLLM_USE_FLASHINFER_MOE_MXFP4_BF16=1 lm_eval --model vllm --model_args pretrained=openai/gpt-oss-20b --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto
vllm (pretrained=openai/gpt-oss-20b,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.3647|±  |0.0133|
|     |       |strict-match    |     5|exact_match|↑  |0.2328|±  |0.0116|

This PR:

lm_eval --model vllm --model_args pretrained=openai/gpt-oss-20b --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto
vllm (pretrained=openai/gpt-oss-20b,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.3647|±  |0.0133|
|     |       |strict-match    |     5|exact_match|↑  |0.2328|±  |0.0116|

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: mgoin <[email protected]>

github-actions · 2025-08-15T21:29:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request enables the Blackwell FlashInfer MXFP4 MoE backend by default when available. The changes introduce helper functions to centralize the logic for determining which FlashInfer backend to use and refactor the existing code to call these helpers. A helpful warning is also added for users on Blackwell hardware when FlashInfer is not installed. The overall changes are well-structured. I've identified one high-severity issue regarding the prioritization of FlashInfer backends, which could lead to using a less performant kernel even when a faster one is enabled. A code suggestion is provided to address this.

yewentao256

LGTM, thanks for the work!

Signed-off-by: mgoin <[email protected]>

celsowm · 2025-08-18T23:55:43Z

sm120 / rtx 50xx gonna finally works too ?

…ect#23008) Signed-off-by: mgoin <[email protected]>

…ect#23008) Signed-off-by: mgoin <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

…ect#23008) Signed-off-by: mgoin <[email protected]>

SabareeshGC · 2025-09-17T17:39:54Z

vllm/model_executor/layers/quantization/mxfp4.py

        if current_platform.is_cuda() and \
-                not current_platform.has_device_capability(100):
-            if not current_platform.is_device_capability(90):
+                not current_platform.is_device_capability(100):


There are sm120 such as RTX 6000 Pro Blackwell , shouldn't we include those as well ?

mgoin added 2 commits August 15, 2025 17:17

Use flashinfer mxfp4 moe by default if available

bccf6e6

Signed-off-by: mgoin <[email protected]>

Fix condition

9d91532

Signed-off-by: mgoin <[email protected]>

mgoin requested review from robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners August 15, 2025 21:29

gemini-code-assist bot reviewed Aug 15, 2025

View reviewed changes

yewentao256 approved these changes Aug 16, 2025

View reviewed changes

Merge branch 'main' into mxfp4-blackwell-flashinfer-default

db3f1a2

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 18, 2025

Merge branch 'main' into mxfp4-blackwell-flashinfer-default

d29cd37

simon-mo merged commit 6d25e3f into vllm-project:main Aug 18, 2025
43 of 47 checks passed

simon-mo pushed a commit that referenced this pull request Aug 18, 2025

Use Blackwell FlashInfer MXFP4 MoE by default if available (#23008)

aab5498

Signed-off-by: mgoin <[email protected]>

princepride pushed a commit to princepride/vllm that referenced this pull request Aug 20, 2025

Use Blackwell FlashInfer MXFP4 MoE by default if available (vllm-proj…

49629d4

…ect#23008) Signed-off-by: mgoin <[email protected]>

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

Use Blackwell FlashInfer MXFP4 MoE by default if available (vllm-proj…

9c4f6b3

…ect#23008) Signed-off-by: mgoin <[email protected]>

cyang49 pushed a commit to cyang49/vllm that referenced this pull request Aug 20, 2025

Use Blackwell FlashInfer MXFP4 MoE by default if available (vllm-proj…

5c9fb6f

…ect#23008) Signed-off-by: mgoin <[email protected]>

djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025

Use Blackwell FlashInfer MXFP4 MoE by default if available (vllm-proj…

481be6d

…ect#23008) Signed-off-by: mgoin <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Use Blackwell FlashInfer MXFP4 MoE by default if available (vllm-proj…

2e2beb0

…ect#23008) Signed-off-by: mgoin <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Use Blackwell FlashInfer MXFP4 MoE by default if available (vllm-proj…

bb6d740

…ect#23008) Signed-off-by: mgoin <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Use Blackwell FlashInfer MXFP4 MoE by default if available (vllm-proj…

4413d5e

…ect#23008) Signed-off-by: mgoin <[email protected]>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

Use Blackwell FlashInfer MXFP4 MoE by default if available (vllm-proj…

75e9fbf

…ect#23008) Signed-off-by: mgoin <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

Use Blackwell FlashInfer MXFP4 MoE by default if available (vllm-proj…

e6d98eb

…ect#23008) Signed-off-by: mgoin <[email protected]>

SabareeshGC reviewed Sep 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Use Blackwell FlashInfer MXFP4 MoE by default if available #23008

Use Blackwell FlashInfer MXFP4 MoE by default if available #23008

Uh oh!

mgoin commented Aug 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

celsowm commented Aug 18, 2025

Uh oh!

SabareeshGC Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Uh oh!

Use Blackwell FlashInfer MXFP4 MoE by default if available #23008

Use Blackwell FlashInfer MXFP4 MoE by default if available #23008

Uh oh!

Conversation

mgoin commented Aug 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

celsowm commented Aug 18, 2025

Uh oh!

SabareeshGC Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mgoin commented Aug 15, 2025 •

edited by github-actions bot

Loading