Section 01
[Main Floor/Introduction] Reasoning Trace Distillation: Can Small Models Learn to Think?
This project centers on the core question of 'Can small models learn to think?' and explores the feasibility of transferring complex reasoning abilities to the Qwen3 1.7B small model by distilling the reasoning traces of DeepSeek-R1. By comparing five different training methods (baseline, SFT trace distillation, RL-verified trace re-distillation, pure GRPO reinforcement learning, two-stage hybrid training), it attempts to reveal feasible paths for the thinking evolution of small models.