Section 01
FRM-PTQ: A New Low-Bit Large Model Quantization Method Enhanced by Feature Relationship Matching (Introduction)
The research team from Harbin Institute of Technology (Shenzhen) proposed the FRM-PTQ framework. Through feature relationship matching and multi-granularity group quantization techniques, it achieves near-full-precision inference performance in W4A4 low-bit scenarios, while bringing a 2x throughput improvement and 3.17x memory compression. It is particularly suitable for new-generation models such as LLaMA-3 and Qwen2.5.