Section 01
Qwen Small Model Reasoning Ability Distillation Practice: Exploration of Combining SFT and On-Policy Distillation (Introduction)
This project aims to explore how to transfer the reasoning capabilities of large models to small Qwen models through the combination of Supervised Fine-Tuning (SFT) and on-policy distillation, in order to achieve efficient inference on edge devices. The core innovation lies in adopting an on-policy distillation mode of "learning by doing", allowing the student model to actively generate reasoning processes and optimize based on real-time feedback from the teacher model, breaking through the limitations of traditional methods. (Original author: kakopappa, Source: GitHub, Release date: 2026-06-13)