Zing Forum

Reading

R-HORIZON: Uncovering Long-Range Reasoning Bottlenecks and Breakthrough Paths for Large Reasoning Models

The LongCat team from Meituan's work R-HORIZON, accepted at ICLR 2026, constructs a long-range reasoning benchmark using a problem combination method, reveals the performance degradation of current large models in multi-step dependent reasoning, and provides effective training improvement solutions.

R-HORIZON美团长程推理ICLR 2026推理模型基准测试问题组合DeepSeek-R1强化学习GRPO
Published 2026-04-02 14:25Recent activity 2026-04-02 14:52Estimated read 5 min
R-HORIZON: Uncovering Long-Range Reasoning Bottlenecks and Breakthrough Paths for Large Reasoning Models
1

Section 01

[Introduction] R-HORIZON: Uncovering Long-Range Reasoning Bottlenecks and Breakthrough Paths for Large Reasoning Models

The LongCat team from Meituan's work R-HORIZON, accepted at ICLR 2026, constructs a long-range reasoning benchmark using a problem combination method, reveals the performance degradation of current large models in multi-step dependent reasoning, and provides effective training improvement solutions, which is of great significance for AI reasoning ability evaluation and model optimization.

2

Section 02

Blind Spot of Existing Reasoning Benchmarks: Disconnect Between Single-Step Tasks and Real Scenarios

Current mainstream reasoning benchmarks (such as MATH, AIME) focus on independent single-step reasoning tasks, with samples isolated from each other, which cannot simulate complex real-world scenarios with multi-step correlations (e.g., pre-steps of scientific experiments, interactions between software development modules). This leads to the inability to evaluate the real long-range dependent reasoning ability of models, forming a blind spot in performance assessment.

3

Section 03

Core Innovation of R-HORIZON: Constructing Long-Range Reasoning Scenarios via Problem Combination

R-HORIZON proposes the Query Combination method to construct long-range reasoning tasks, with a three-step process: 1. Filter problems containing valid integers (to ensure variable replacement feasibility); 2. Identify key variables (as connectors between problems); 3. Concatenate problems to form chain dependencies (the answer of the previous step serves as the parameter for the next step, enforcing long-range logical consistency).

4

Section 04

Benchmark Results: Significant Performance Degradation of All Models in Long-Range Reasoning

Evaluation of over 20 advanced models shows that all models experience a sharp performance drop in long-range reasoning. Taking DeepSeek-R1 as an example: the pass rate for single AIME25 problems is 87.3%, while it drops to only 24.6% for 5 concatenated problems; larger models have stronger resilience, but the degradation in code generation tasks is steeper, and models have the problem of unbalanced allocation of thinking resources.

5

Section 05

Training Improvement Solution: Enhancing Long-Range Reasoning Ability via Reinforcement Learning

The team trained models using R-HORIZON combined data with GRPO reinforcement learning. The results show: Training with 2-problem combinations improved AIME24 (n=2) by 17.4 points and single problems by 7.5 points (positive transfer); training with n=4 combinations increased the pass rate of MATH500 (n=8) from 8.4% to 50.6%, proving the effectiveness of the training method.

6

Section 06

Implications for AI Development: Redefining Reasoning Evaluation and Scaling Directions

This research implies: 1. Need to construct a more comprehensive long-range reasoning evaluation framework; 2. Reveal a new dimension of Scaling Law—the length of reasoning chains; 3. Provide theoretical and data foundations for Agent systems (multi-step planning and execution).

7

Section 07

Open-Source Contributions: Promoting the Development of the Long-Range Reasoning Research Community

The team has open-sourced: the paper (arXiv:2510.08189), benchmark datasets (Hugging Face includes subsets like Math500), combined training data, and trained models to help researchers reproduce and improve.

8

Section 08

Conclusion: Challenges and Future Directions of Long-Range Reasoning

R-HORIZON reveals the capability boundary of current large models in long-range reasoning, but also proves that significant improvement can be achieved through targeted training. We look forward to the community using open-source resources to jointly push AI reasoning ability to new heights.