Zing Forum

Reading

R2R: Efficient Reasoning Path Exploration via Collaborative Routing Between Small and Large Models

Introduces the NeurIPS 2025 paper R2R, which proposes a token routing mechanism for collaboration between small and large models, significantly reducing computational costs while maintaining reasoning quality.

R2R推理优化大小模型协同token路由高效推理模型级联NeurIPS
Published 2026-04-02 17:55Recent activity 2026-04-02 18:21Estimated read 4 min
R2R: Efficient Reasoning Path Exploration via Collaborative Routing Between Small and Large Models
1

Section 01

R2R: Efficient Reasoning Path Exploration via Collaborative Routing Between Small and Large Models (Introduction)

The NeurIPS 2025 paper R2R proposes a token routing mechanism for collaboration between small and large models to address the high inference cost of large models, significantly reducing computational costs (e.g., 40-60% cost reduction in math tasks) while maintaining reasoning quality.

2

Section 02

Cost Dilemma of Large Model Inference (Background)

Large models generate a large number of intermediate tokens in complex reasoning tasks (chain-of-thought, multi-path exploration), leading to exponential cost growth that restricts practical deployment. R2R aims to balance reasoning efficiency and quality.

3

Section 03

Core Mechanism and Architecture of R2R (Methodology)

Core Insight: Different tokens in reasoning have varying importance—key decision points require large models, while routine content can use small models. The architecture includes a routing strategy network (a lightweight classifier to predict token difficulty), a small model (for simple tokens), and a large model (for difficult tokens). Strategy learning uses a self-supervised approach: label difficult tokens using the large model's golden path to optimize the balance between accuracy and cost.

4

Section 04

Experimental Results Validate Win-Win of Efficiency and Quality (Evidence)

Mathematical Reasoning (GSM8K, MATH): Maintains similar accuracy with 40-60% cost reduction; Code Generation (HumanEval): Significant cost advantages, with slightly higher pass rates in some scenarios; Ablation experiments prove the learning strategy is effective, while random or fixed threshold strategies perform poorly.

5

Section 05

Application Scenarios and Deployment Recommendations (Suggestions)

Applicable Scenarios: Cost-sensitive online services, edge devices (local small model + cloud large model), multi-tenant systems (adjusted according to user preferences). Deployment Recommendations: Train the strategy using task data, and establish a monitoring mechanism to track quality and cost.

6

Section 06

Limitations and Future Directions (Conclusion)

Limitations: Requires golden outputs from large models; multimodal reasoning remains to be explored. Future Directions: Weakly supervised/RL training strategies, multimodal expansion, multi-model systems, global path optimization. Conclusion: R2R provides an important direction for LLM inference optimization, and intelligent system design is key to practical application.