Error Running Qwen3-30B-A3B-FP-DYNAMIC Model on Sglang

Can models quantized with dynamic FP8 using the llmcompressor library be inferred on Sglang? I encountered an error during inference. The quantization code is as follows:
```
import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor import oneshot
from llmcompressor.utils import dispatch_for_generation

MODEL_ID = "/open_source/Qwen3-30B-A3B"

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, torch_dtype=torch.bfloat16, trust_remote_code=True,device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

# Configure the quantization algorithm and scheme.
# In this case, we:
#   * quantize the weights to fp8 with per channel via ptq
#   * quantize the activations to fp8 with dynamic per token
recipe = QuantizationModifier(
    targets="Linear",
    scheme="FP8_DYNAMIC",
    ignore=["lm_head", "re:.*mlp.gate$", "re:.*mlp.shared_expert_gate$"],
)

output_dir = "./Qwen3-30B-A3B-FP8-DYNAMIC-0819"
# Apply quantization.
oneshot(
    model=model,
    recipe=recipe,
    save_compressed=True,
    trust_remote_code_model=True,
    output_dir=output_dir,
)

```
Execution command:
`CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server --model ./Qwen3-30B-A3B-FP8-DYNAMIC-0819 --mem-fraction-static 0.8 --host 0.0.0.0 --port 8801  --context-length 4200 --enable-ep-moe `


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error Running Qwen3-30B-A3B-FP-DYNAMIC Model on Sglang #1763

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error Running Qwen3-30B-A3B-FP-DYNAMIC Model on Sglang #1763

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions