Section 01
Introduction / Main Floor: MTP-D: Self-Distillation Boosts Multi-Token Prediction, Achieving 220% Inference Acceleration
MTP-D uses self-distillation to increase the acceptance rate of multi-token prediction heads by 7.5%, and its looped extension strategy achieves 220.4% inference acceleration compared to single-head MTP, providing new ideas for optimizing LLM inference efficiency.