Skip to content

[AutoDeploy]:Unify Checkpoint and Graph-Based Quantization Detection #5860

@Fridah-nv

Description

@Fridah-nv

Currently, AutoDeploy handles quantized model detection through two separate mechanisms:

  1. config parsing (e.g. via _load_quantization_config() in HF factory)
  2. quantized nodes detection in graph transformation

This split introduces redundancy and complexity when onboarding new quant formats from source other than ModelOpt.

We want to introduce a base class to unify detection, transformation, and parameter resolution across all quant sources.

Goals:

  • Support any quantization source (ModelOpt, native graph, etc.) via a unified handler.
  • Remove scattered logic in HF factory for hf_quant_config.json.
  • Make quantization transformation format-agnostic and modular.
  • Reduce copy-paste code when adding new formats.

Subtasks:

  • Introduce New Interface and Registry
  • Migrate ModelOpt + FP8/NVFP4 Handler
  • Refactor HF Factory Logic
  • Refactor Graph Quantization Pass
  • Validate with new quantization format

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy Backend

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions