-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy Backend
Description
Currently, AutoDeploy handles quantized model detection through two separate mechanisms:
- config parsing (e.g. via _load_quantization_config() in HF factory)
- quantized nodes detection in graph transformation
This split introduces redundancy and complexity when onboarding new quant formats from source other than ModelOpt.
We want to introduce a base class to unify detection, transformation, and parameter resolution across all quant sources.
Goals:
- Support any quantization source (ModelOpt, native graph, etc.) via a unified handler.
- Remove scattered logic in HF factory for hf_quant_config.json.
- Make quantization transformation format-agnostic and modular.
- Reduce copy-paste code when adding new formats.
Subtasks:
- Introduce New Interface and Registry
- Migrate ModelOpt + FP8/NVFP4 Handler
- Refactor HF Factory Logic
- Refactor Graph Quantization Pass
- Validate with new quantization format
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy Backend
Type
Projects
Status
Done