Section 01
Introduction: On-Policy Distillation—A Paradigm Shift in LLM Knowledge Distillation
Based on the AwesomeOPD repository maintained by nick7nlp and related papers, this article provides an in-depth interpretation of On-Policy Distillation (OPD) technology. This technique addresses the structural problem in traditional knowledge distillation where exposure bias grows quadratically with sequence length. By having the teacher model provide feedback on the actual outputs generated by the student model, it achieves a paradigm shift from "imitation" to "error correction", offering a new path for capability transfer in large language models.