# R-HORIZON: Uncovering Long-Range Reasoning Bottlenecks and Breakthrough Paths for Large Reasoning Models

> The LongCat team from Meituan's work R-HORIZON, accepted at ICLR 2026, constructs a long-range reasoning benchmark using a problem combination method, reveals the performance degradation of current large models in multi-step dependent reasoning, and provides effective training improvement solutions.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-02T06:25:37.000Z
- 最近活动: 2026-04-02T06:52:22.736Z
- 热度: 163.6
- 关键词: R-HORIZON, 美团, 长程推理, ICLR 2026, 推理模型, 基准测试, 问题组合, DeepSeek-R1, 强化学习, GRPO
- 页面链接: https://www.zingnex.cn/en/forum/thread/r-horizon
- Canonical: https://www.zingnex.cn/forum/thread/r-horizon
- Markdown 来源: floors_fallback

---

## [Introduction] R-HORIZON: Uncovering Long-Range Reasoning Bottlenecks and Breakthrough Paths for Large Reasoning Models

The LongCat team from Meituan's work R-HORIZON, accepted at ICLR 2026, constructs a long-range reasoning benchmark using a problem combination method, reveals the performance degradation of current large models in multi-step dependent reasoning, and provides effective training improvement solutions, which is of great significance for AI reasoning ability evaluation and model optimization.

## Blind Spot of Existing Reasoning Benchmarks: Disconnect Between Single-Step Tasks and Real Scenarios

Current mainstream reasoning benchmarks (such as MATH, AIME) focus on independent single-step reasoning tasks, with samples isolated from each other, which cannot simulate complex real-world scenarios with multi-step correlations (e.g., pre-steps of scientific experiments, interactions between software development modules). This leads to the inability to evaluate the real long-range dependent reasoning ability of models, forming a blind spot in performance assessment.

## Core Innovation of R-HORIZON: Constructing Long-Range Reasoning Scenarios via Problem Combination

R-HORIZON proposes the **Query Combination** method to construct long-range reasoning tasks, with a three-step process: 1. Filter problems containing valid integers (to ensure variable replacement feasibility); 2. Identify key variables (as connectors between problems); 3. Concatenate problems to form chain dependencies (the answer of the previous step serves as the parameter for the next step, enforcing long-range logical consistency).

## Benchmark Results: Significant Performance Degradation of All Models in Long-Range Reasoning

Evaluation of over 20 advanced models shows that all models experience a sharp performance drop in long-range reasoning. Taking DeepSeek-R1 as an example: the pass rate for single AIME25 problems is 87.3%, while it drops to only 24.6% for 5 concatenated problems; larger models have stronger resilience, but the degradation in code generation tasks is steeper, and models have the problem of unbalanced allocation of thinking resources.

## Training Improvement Solution: Enhancing Long-Range Reasoning Ability via Reinforcement Learning

The team trained models using R-HORIZON combined data with GRPO reinforcement learning. The results show: Training with 2-problem combinations improved AIME24 (n=2) by 17.4 points and single problems by 7.5 points (positive transfer); training with n=4 combinations increased the pass rate of MATH500 (n=8) from 8.4% to 50.6%, proving the effectiveness of the training method.

## Implications for AI Development: Redefining Reasoning Evaluation and Scaling Directions

This research implies: 1. Need to construct a more comprehensive long-range reasoning evaluation framework; 2. Reveal a new dimension of Scaling Law—the length of reasoning chains; 3. Provide theoretical and data foundations for Agent systems (multi-step planning and execution).

## Open-Source Contributions: Promoting the Development of the Long-Range Reasoning Research Community

The team has open-sourced: the paper (arXiv:2510.08189), benchmark datasets (Hugging Face includes subsets like Math500), combined training data, and trained models to help researchers reproduce and improve.

## Conclusion: Challenges and Future Directions of Long-Range Reasoning

R-HORIZON reveals the capability boundary of current large models in long-range reasoning, but also proves that significant improvement can be achieved through targeted training. We look forward to the community using open-source resources to jointly push AI reasoning ability to new heights.
