Section 01
[Introduction] Task Reward-Driven RL: Key Findings Beyond Distribution Sharpening
This article, through theoretical analysis and experimental validation, reveals the inherent limitations of distribution sharpening methods. It proves that task reward-based reinforcement learning (RL) is not merely distribution sharpening that "activates" the model's existing capabilities, but a genuine learning process that can achieve more robust performance improvements and a stable learning trajectory, capable of injecting new reasoning patterns and problem-solving strategies.