Skip to content

Releases: IST-DASLab/qutlass

v0.1.0

03 Sep 09:53
155e03e

Choose a tag to compare

🚀 What is new in QuTLASS v0.1.0:

  • Support for sm_100 GPUs (e.g., NVIDIA B200).
  • NVFP4 Microscaling:
    • Full W4A4 quantization support.
    • Online rotations:
      • Fused transform + quantization + scale computation.
      • Rotation matrices loaded at runtime, allowing any transformation to be applied.
    • NVFP4 Matmul Kernels:
      • CUTLASS-backed NVFP4:NVFP4 with block-scale reordering.
    • Quantization:
      • Abs-Max supported.
  • Multiple rotation sizes (16/32/64/128) supported for both MXFP4 and NVFP4.

v0.0.1

15 Jul 09:58

Choose a tag to compare

Core features of QuTLASS v0.0.1:

  • MXFP4 microscaling support, with
  • Weight and Activation quantization (W4A4)
  • Online rotations: fused kernel for Hadamard tranforms, quantization, and scale computation.
    • Hadamard sizes matching the microscaling group sizes (i.e., 32 for MXFP4).
    • Compatible with any rotation matrix defined, as they are loaded in runtime.
  • Multiple quantization schemes:
  • Matmul kernels:
    • CUTLASS-backed MXFP4:MXFP4 kernel with block-scale reordering.
    • Prototype kernel for small batch sizes (no reordering required).
  • Transformers Integration (PR #38696)