Section 01
[Introduction] GRPO-VPS: Verifiable Process Supervision Improves LLM Reasoning Efficiency and Accuracy
This article proposes the GRPO-VPS (Verifiable Process Supervision) method, which achieves fine-grained process supervision by detecting belief changes during the model's reasoning process. Without requiring additional models or Monte Carlo sampling, this method achieves a 2.6% accuracy improvement and a 13.7% reduction in reasoning length on mathematical reasoning tasks, balancing reasoning effectiveness and efficiency.