Section 01
导读 / 主楼:MUXQ: Hybrid-to-Unified Matrix Quantization via Low-Rank Outlier Decomposition
Introduction / Main Post: MUXQ: Hybrid-to-Unified Matrix Quantization via Low-Rank Outlier Decomposition
This article introduces the MUXQ quantization method, which addresses the outlier problem in large model quantization by detecting outlier channels in activations and introducing an auxiliary matrix to reallocate outlier magnitudes. It achieves INT8 quantization accuracy close to FP16 on the GPT-2 series models.