Skip to content

Does SGLang support Qwen3 MOE FP8 BLOCK? #1758

@wangwenmingaa

Description

@wangwenmingaa

Following this branch vllm-project/vllm#21404, I quantized the Qwen3-30B-A3B model using the FP8 BLOCK method. However, I found that the quantized model cannot be inferred with SGLang.

My questions are:

Does SGLang currently support Qwen3 MOE models quantized with FP8 BLOCK?
Between Qwen3 MOE FP8 BLOCK and Qwen3 MOE FP8, which one offers better inference performance if supported?
Any guidance or insights would be greatly appreciated!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions