# R2R: Efficient Reasoning Path Exploration via Collaborative Routing Between Small and Large Models

> Introduces the NeurIPS 2025 paper R2R, which proposes a token routing mechanism for collaboration between small and large models, significantly reducing computational costs while maintaining reasoning quality.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-02T09:55:55.000Z
- 最近活动: 2026-04-02T10:21:57.976Z
- 热度: 139.6
- 关键词: R2R, 推理优化, 大小模型协同, token路由, 高效推理, 模型级联, NeurIPS
- 页面链接: https://www.zingnex.cn/en/forum/thread/r2r
- Canonical: https://www.zingnex.cn/forum/thread/r2r
- Markdown 来源: floors_fallback

---

## R2R: Efficient Reasoning Path Exploration via Collaborative Routing Between Small and Large Models (Introduction)

The NeurIPS 2025 paper R2R proposes a token routing mechanism for collaboration between small and large models to address the high inference cost of large models, significantly reducing computational costs (e.g., 40-60% cost reduction in math tasks) while maintaining reasoning quality.

## Cost Dilemma of Large Model Inference (Background)

Large models generate a large number of intermediate tokens in complex reasoning tasks (chain-of-thought, multi-path exploration), leading to exponential cost growth that restricts practical deployment. R2R aims to balance reasoning efficiency and quality.

## Core Mechanism and Architecture of R2R (Methodology)

Core Insight: Different tokens in reasoning have varying importance—key decision points require large models, while routine content can use small models. The architecture includes a routing strategy network (a lightweight classifier to predict token difficulty), a small model (for simple tokens), and a large model (for difficult tokens). Strategy learning uses a self-supervised approach: label difficult tokens using the large model's golden path to optimize the balance between accuracy and cost.

## Experimental Results Validate Win-Win of Efficiency and Quality (Evidence)

Mathematical Reasoning (GSM8K, MATH): Maintains similar accuracy with 40-60% cost reduction; Code Generation (HumanEval): Significant cost advantages, with slightly higher pass rates in some scenarios; Ablation experiments prove the learning strategy is effective, while random or fixed threshold strategies perform poorly.

## Application Scenarios and Deployment Recommendations (Suggestions)

Applicable Scenarios: Cost-sensitive online services, edge devices (local small model + cloud large model), multi-tenant systems (adjusted according to user preferences). Deployment Recommendations: Train the strategy using task data, and establish a monitoring mechanism to track quality and cost.

## Limitations and Future Directions (Conclusion)

Limitations: Requires golden outputs from large models; multimodal reasoning remains to be explored. Future Directions: Weakly supervised/RL training strategies, multimodal expansion, multi-model systems, global path optimization. Conclusion: R2R provides an important direction for LLM inference optimization, and intelligent system design is key to practical application.