Section 01
Introduction: A New Distributed Text Compression Paradigm Combining LLMs and Arithmetic Coding
The SMU research team open-sourced the first hybrid text compression system combining Transformer-based LLMs with arithmetic coding. It achieves multi-GPU distributed compression on the DGX A100 SuperPOD and supports four model architectures: BERT, RoBERTa, T5, and Llama-3.2-3B, bringing a new paradigm to the field of text compression.