Section 01
Introduction: The Quality-Utility Paradox Challenges Traditional Understanding of Knowledge Distillation
A paper accepted at ICML 2026 reveals a counterintuitive finding: high-reward data refined by strong models (Oracles) actually harms small models' mathematical reasoning ability more than data generated and filtered by the small models themselves. This phenomenon is called the 'Quality-Utility Paradox', and its core cause is that Oracle refinement leads to a drift in the small model's native reasoning distribution. The study proposes a style-aligned refinement method to address this issue.