Section 01
Online Feedback Distillation: Enabling Small Models to Provide Reasoning Feedback Like Large Models (Introduction)
This article introduces an innovative knowledge distillation framework—Online Feedback Distillation—aimed at solving the feedback dilemma in reasoning models. The framework enables lightweight models to mimic the expert feedback capabilities of large models through online training, realizing a self-improvement loop in reasoning tasks. The core innovation lies in replacing fixed amateur models with adaptively learnable student models, combined with designs such as a unified model with dual roles, adaptive knowledge distillation gating, and multi-objective Pareto analysis. This reduces inference costs while improving the feedback quality of small models. The project is open-sourced on GitHub, supports multiple model configurations, and is friendly to Apple Silicon users.