Section 01
Introduction: Compress-Distill—Reasoning Trace Compression Boosts Knowledge Distillation Efficiency
The research team proposes the Compress-Distill method, which addresses efficiency issues in knowledge distillation by applying post-processing compression to the chain-of-thought (CoT) of reasoning models. Key findings: Compressed traces reduce training tokens to 12-30% of the original, speed up training by 2.0-7.6x, and shorten inference outputs by 3-19x; small student models can retain 96% of the original accuracy while gaining an 18x improvement in token efficiency. This method achieves a favorable balance between accuracy and efficiency.
Original Paper Info: arXiv preprint (June 4, 2026), title "Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation", link http://arxiv.org/abs/2606.05988v1