Section 01
Introduction / Main Floor: HIVE: Dynamically Selecting High-Value Prompts at the 'Learning Edge' to Improve RL Training Efficiency
The HIVE framework precisely locates the 'medium difficulty + high uncertainty' learning edge region through historical reward trajectories and real-time prompt entropy filtering, enabling efficient reinforcement learning training on mathematical reasoning tasks.