Section 01
EGRSD: Entropy-Aware Self-Distillation for Enhancing Reasoning Efficiency of Large Language Models (Introduction)
This article proposes the EGRSD (Entropy-Guided Reinforced Self-Distillation) method, which dynamically adjusts the supervision weights of each position in the reasoning chain through an entropy confidence gating mechanism from the teacher model to address the uniform weighting problem in existing self-distillation methods. This method optimizes reasoning length while maintaining accuracy and has been validated effective on the Qwen3 model; it also introduces the CL-EGRSD causal look-ahead variant to further refine the supervision signals. This article will discuss aspects such as background, methodology, experiments, and significance.