Section 01
Cultivating Reasoning Capabilities in Small Models: Core Findings and Methodology Overview
This article introduces the research findings of the open-source project small-LM-reasoning-posttraining: a small Transformer built from scratch can acquire arithmetic reasoning capabilities through carefully designed curriculum learning and post-training strategies. The core finding is that curriculum design is far more important than early RL application—it is necessary to first establish basic capabilities via targeted curriculum SFT, then refine them with KL-regularized RL. The final strategy improves the arithmetic reasoning accuracy of small models from 80.7% to 90.7%, while providing a reproducible research framework that also has reference value for large model training.