Section 01
Introduction: Core Overview of the OneShotTrainingExample Project
OneShotTrainingExample is a unified workspace that integrates GHPO/Open-R1 training code and one-shot RLVR selector experiments. It aims to efficiently enhance the performance of mathematical reasoning models through the one-shot RLVR selector, providing a complete training, evaluation, and analysis workflow. This addresses the issues of insufficient deep reasoning capabilities in traditional supervised fine-tuning (SFT) and the high resource consumption and complex tuning of reinforcement learning (RL) training.