Section 01
Introduction: DistillReasoning—Low-Cost Transfer of Trillion-Scale Model Reasoning Capabilities to a 4B Small Model
The DistillReasoning project demonstrates an efficient model distillation method that successfully transfers reasoning capabilities from ultra-large teacher models with 744B and 1T parameters to a student model with only 4B parameters. The entire training process costs approximately $14 in computing expenses, allowing the small model to run on a laptop while achieving reasoning performance close to that of large models, providing a new path for AI democratization and edge deployment.