Section 01
[Introduction] Core Summary of Dual-Dimensional Research on RLVR Reasoning Training Data Allocation
This study focuses on data allocation strategies for RLVR reasoning training. By constructing a synthetic knowledge graph environment, it systematically analyzes the impact of two dimensions—reasoning depth and environmental complexity. Key findings include: data allocation strategies covering both dimensions jointly outperform single-axis schemes; inductive-analogical and deductive-abductive reasoning form two distinct task clusters; strategies that uniformly mix samples of different difficulty levels perform better. This research provides key design principles for enhancing the comprehensive reasoning capabilities of models.