Does SGLang support Qwen3 MOE FP8 BLOCK?

Following this branch https://github.com/vllm-project/vllm/pull/21404, I quantized the Qwen3-30B-A3B model using the FP8 BLOCK method. However, I found that the quantized model cannot be inferred with SGLang.

My questions are:

Does SGLang currently support Qwen3 MOE models quantized with FP8 BLOCK?
Between Qwen3 MOE FP8 BLOCK and Qwen3 MOE FP8, which one offers better inference performance if supported?
Any guidance or insights would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does SGLang support Qwen3 MOE FP8 BLOCK? #1758

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does SGLang support Qwen3 MOE FP8 BLOCK? #1758

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions