Zing Forum

Reading

SigmaScale: A Large Language Model Compression Method Based on SVD Low-Rank Decomposition and Learned Scaling Matrices

SigmaScale optimizes large language model compression based on truncated singular value decomposition (SVD) by learning auxiliary scaling matrices. It optimizes row and column scaling transformations under activation-aware compression loss, effectively reducing the intrinsic rank of weight matrices.

大语言模型压缩SVD低秩分解模型量化激活感知压缩缩放矩阵
Published 2026-06-05 17:48Recent activity 2026-06-08 11:26Estimated read 6 min
SigmaScale: A Large Language Model Compression Method Based on SVD Low-Rank Decomposition and Learned Scaling Matrices
1

Section 01

[Introduction] SigmaScale: Core Introduction to the LLM Compression Method Based on SVD and Learned Scaling Matrices

SigmaScale is a compression method for large language models (LLMs). Its core is to optimize compression based on truncated singular value decomposition (SVD) by learning auxiliary scaling matrices. Guided by activation-aware compression loss, it optimizes row and column scaling transformations, effectively reducing the intrinsic rank of weight matrices, and achieves efficient compression while maintaining model performance. This article will discuss it from aspects such as background, method, and experiments.

2

Section 02

Research Background: Necessity of LLM Compression and Limitations of Traditional SVD Methods

Large language models (such as GPT-4 and Llama3) have parameter scales of tens of billions or even hundreds of billions, requiring huge resources for training and inference, making model compression a key issue. Low-rank decomposition based on SVD is an important compression approach, but traditional SVD methods have limitations: lack of adaptability to model weight structures, ignoring activation information, and theoretical optimal solutions may deviate from actual optimal ones.

3

Section 03

Core Innovations: Learned Scaling Matrices and Activation-Aware Compression Strategy

The core innovations of SigmaScale include: 1. Replacing analytical derivation with end-to-end learned scaling matrices to adapt to weight distributions of different models/layers; 2. Introducing activation-aware compression loss, considering the interaction between weights and activations, and prioritizing the retention of components with large impacts; 3. Learning two sets of row and column scaling vectors to flexibly adjust the scale of weight matrices and reduce the effective intrinsic rank.

4

Section 04

Effective Rank Analysis: How Scaling Transformations Improve Compression Effect

The learned scaling transformations can reduce the effective intrinsic rank of weight matrices (observable from the reduction of effective rank entropy), and the lower the effective rank, the smaller the compression performance loss. This relationship is consistent across different models and layers, indicating that compression not only reduces parameters but also needs to reorganize parameter distributions to concentrate key information.

5

Section 05

Experimental Results: Performance of SigmaScale on Mainstream Models

Experiments were conducted on Llama3.1 8B Instruct and Qwen3-8B, with evaluation metrics including perplexity and zero-shot benchmark tests. The results show: SigmaScale's perplexity is comparable to state-of-the-art (SOTA) SVD compression methods; its performance on zero-shot tasks is highly competitive; it has obvious advantages in specific tasks.

6

Section 06

Technical Advantages: Flexibility, Activation Awareness, and Practical Value

The advantages of SigmaScale include: 1. Flexibility: Adapting to different model architectures through learning; 2. Activation awareness: Close to actual inference scenarios with stable performance; 3. Interpretability: Scaling matrices provide weight importance information; 4. Practical value: Helping deploy LLMs in resource-constrained environments and reducing costs.

7

Section 07

Limitations and Prospects: Future Optimization Directions

Limitations of SigmaScale: Training scaling matrices requires additional computational overhead; compression ratio is limited by the rank of the original weight matrix; it has not been combined with other compression technologies. Future directions can explore efficient optimization algorithms, combination with quantization/pruning, etc.

8

Section 08

Summary: Contributions of SigmaScale to the LLM Compression Field

SigmaScale optimizes SVD compression by learning scaling matrices, which is an important progress in the LLM compression field. It combines activation-aware loss and end-to-end learning to achieve effective compression while maintaining performance, providing a new option for reducing LLM deployment costs and a new perspective for compression theory research.