Section 01
Tiny Think Research Guide: Exploration of Reasoning-Prior Post-Training for 140M Small Models with Single-Card Training
Tiny Think is a post-training study on the reasoning capabilities of 140M-parameter ultra-small language models. It explores the impact of Supervised Fine-Tuning (SFT) and preference optimization (DPO/APO) on mathematical and general reasoning abilities using a single consumer-grade GPU, revealing the capability trade-off phenomenon in post-training (i.e., the "capability tax" where improvement in specific tasks is accompanied by degradation in general abilities). The research focuses on the practical value of edge deployment, and the code, models, and paper have been open-sourced.