Section 01
[Introduction] DenoiseRL: A Bootstrapping Framework for Reasoning Models Without Strong Supervision
DenoiseRL: An Innovative Framework Learning from Errors
DenoiseRL is a reinforcement learning framework without strong supervision. Its core is to learn recovery strategies from the erroneous reasoning traces of weak models, getting rid of dependence on strong teacher models and carefully curated datasets. This framework consistently outperforms existing baselines on mathematical and general reasoning benchmarks. The related research was published on arXiv on May 27, 2026 (Original link: http://arxiv.org/abs/2605.28421v1).