Section 01
SimpleRL-Zoo Project Overview: Minimalist RL Method Significantly Enhances Mathematical Reasoning of Foundation Models
The SimpleRL-Zoo project, open-sourced by the NLP Lab at Hong Kong University of Science and Technology, demonstrates an efficient training method: using only 8K mathematical data samples and a rule-based reward function, it can achieve an absolute accuracy improvement of 10 to 20 percentage points in mathematical reasoning tasks for 10 different open-source foundation models (covering 0.5B to 32B parameters, including Llama3, Mistral, DeepSeekMath, Qwen2.5 series, etc.).