Section 01
CLP: Guide to Zero-Loss Adaptive Multi-Token Inference Acceleration Scheme
CLP proposes a lightweight multi-token inference acceleration scheme, with the core being the Backbone-as-Architect design principle and an ultra-simple linear decision layer (CLP predictor). This scheme achieves 1.14x-1.29x end-to-end acceleration on the Qwen2.5 model series (0.5B, 1.5B, 7B) while maintaining zero quality degradation, solving the problem of generation quality decline caused by head-backbone competition in traditional MTP technologies.