Zing Forum

Reading

Robust Reasoning Under Noisy Supervision: Online Label Refinement Enables LLMs to Self-Correct in Mislabeled Data

This paper systematically analyzes the noisy label mechanism in RLVR training, proposes an online label refinement method called OLR, gradually corrects mislabels through majority voting and dynamic consistency detection, and significantly improves model robustness even under noise ratios as high as 90%.

强化学习噪声标签推理模型标签精炼鲁棒性自我纠正
Published 2026-04-05 14:30Recent activity 2026-04-07 10:52Estimated read 1 min
Robust Reasoning Under Noisy Supervision: Online Label Refinement Enables LLMs to Self-Correct in Mislabeled Data
1

Section 01

导读 / 主楼:Robust Reasoning Under Noisy Supervision: Online Label Refinement Enables LLMs to Self-Correct in Mislabeled Data

Introduction / Main Floor: Robust Reasoning Under Noisy Supervision: Online Label Refinement Enables LLMs to Self-Correct in Mislabeled Data

This paper systematically analyzes the noisy label mechanism in RLVR training, proposes an online label refinement method called OLR, gradually corrects mislabels through majority voting and dynamic consistency detection, and significantly improves model robustness even under noise ratios as high as 90%.