Releases · IST-DASLab/qutlass · GitHub

03 Sep 09:53

LopezCastroRoberto

v0.1.0 Latest

Latest

🚀 What is new in QuTLASS v0.1.0:

Support for sm_100 GPUs (e.g., NVIDIA B200).
NVFP4 Microscaling:
- Full W4A4 quantization support.
- Online rotations:
  - Fused transform + quantization + scale computation.
  - Rotation matrices loaded at runtime, allowing any transformation to be applied.
- NVFP4 Matmul Kernels:
  - CUTLASS-backed NVFP4:NVFP4 with block-scale reordering.
- Quantization:
  - Abs-Max supported.
Multiple rotation sizes (16/32/64/128) supported for both MXFP4 and NVFP4.

Assets 2

15 Jul 09:58

LopezCastroRoberto

v0.0.1

Core features of QuTLASS v0.0.1:

MXFP4 microscaling support, with
Weight and Activation quantization (W4A4)
Online rotations: fused kernel for Hadamard tranforms, quantization, and scale computation.
- Hadamard sizes matching the microscaling group sizes (i.e., 32 for MXFP4).
- Compatible with any rotation matrix defined, as they are loaded in runtime.
Multiple quantization schemes:
- Quartet (i.e., Quest-like).
- Abs-Max.
Matmul kernels:
- CUTLASS-backed MXFP4:MXFP4 kernel with block-scale reordering.
- Prototype kernel for small batch sizes (no reordering required).
Transformers Integration (PR #38696)

Assets 2