Section 01
[Introduction] Math-SLM: Efficient Training of a Small Math Reasoning Model in 3.5 Hours
This project was published by debtirthasaha on GitHub (link: https://github.com/debtirthasaha/math-slm), demonstrating how to fine-tune the math reasoning ability of DeepSeek-R1-Distill-Qwen-7B in just 3.5 hours using 8 H100 GPUs. The core strategy is a combination of SFT (Supervised Fine-Tuning) + DPO (Direct Preference Optimization) + LoRA (Low-Rank Adaptation), providing an efficient solution for model training in resource-constrained scenarios.