Section 01
[Introduction] HSIR: Making Self-Improvement of Large Reasoning Models Both Efficient and Effective
Core Information
- Source: Paper Better, Faster: Harnessing Self-Improvement in Large Reasoning Models published on arXiv on May 24, 2026 (Link: http://arxiv.org/abs/2605.24998v1)
- Core Problems: Two major dilemmas in self-improvement of large reasoning models: data imbalance (more simple samples, fewer difficult samples) and overthinking (redundant reasoning steps)
- Solution: HSIR uses a two-pronged approach: "Verify-Exit" sampling strategy and intrinsic diversity scoring
- Effects: Average reasoning performance improved by 10.9%, relative inference overhead reduced by 42.4%, and applicable to multiple post-training paradigms