Section 01
MUXQ: A New Method for High-Precision INT8 Quantization via Low-Rank Outlier Decomposition
This article introduces the MUXQ quantization method, which targets the outlier problem in large model quantization. By detecting outlier channels in activations and introducing a low-rank auxiliary matrix to reallocate outlier magnitudes, it overcomes the limitations of existing methods. This method achieves INT8 quantization accuracy close to FP16 on the GPT-2 series models, maintains a unified computation structure, and is suitable for edge NPU deployment acceleration.