Section 01
Introduction: DuQuant++ — A New Fine-Grained Rotational Quantization Scheme for MXFP4 Format
DuQuant++ is a new fine-grained rotational quantization method for the MXFP4 micro-scaling format. By aligning the rotation block size with the MXFP4 group size, it achieves precise optimization of activation outliers. While maintaining SOTA performance, this method reduces online rotation computation cost by half, providing a new path for efficient deployment of large models at 4-bit precision.