Section 01
[Introduction] Breaking the VRAM Bottleneck: Lossless Compression Pushes Large Model Weights Close to the Shannon Limit
Researchers found that LLM weights have 2-10 times statistical redundancy and proposed a real-time lossless decompression framework based on Asymmetric Numeral Systems (ANS). Without compromising model accuracy, this framework increases Qwen-14B's batch size by 60% and Mixtral-176B's by 4.8 times. The compression ratio approaches the Shannon limit, opening up new paths for large model deployment.
Original paper source: arXiv (2606.15789v1), published on June 14, 2026.