Section 01
[Introduction] Core Points of the Research on Activation Value Measurement of Open-Source Large Models
This article conducts a systematic measurement study on the dynamic range of activation values in modern open-source large language models. It finds that the maximum activation values of different model families differ by nearly four orders of magnitude, activation values of MoE architectures are significantly lower than those of Dense models of the same scale, and residual streams carry the global maximum activation values. These findings have important guiding significance for low-bit quantization deployment, emphasizing that activation values should be measured and reported as model attributes.