# Disagreement-Guided Strategy Routing: Enabling Large Models to Vote When Needed and Rewrite When Necessary

> Large reasoning models show unstable performance on mathematical tasks. The new framework dynamically selects test-time expansion strategies based on output disagreement: lightweight processing for consistent samples, majority voting for moderate disagreement, and problem rewriting for high ambiguity, achieving a 3-7% accuracy improvement while reducing sampling costs.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-29T13:11:39.000Z
- 最近活动: 2026-04-30T02:35:15.958Z
- 热度: 142.6
- 关键词: 测试时扩展, 大模型推理, 数学推理, 策略路由, 多数投票, 问题重写
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2604-26644v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2604-26644v1
- Markdown 来源: floors_fallback

---

## [Introduction] Disagreement-Guided Strategy Routing: Making Large Model Reasoning Smarter and More Efficient

Large reasoning models exhibit unstable performance on mathematical tasks. Existing test-time expansion strategies have problems such as high computational overhead and a one-size-fits-all approach for all instances. This study proposes a disagreement-guided strategy routing framework that dynamically selects processing strategies based on output disagreement: lightweight processing for low-disagreement instances, majority voting for moderate disagreement, and problem rewriting for high ambiguity. The framework achieves a 3-7% accuracy improvement while reducing sampling costs, and can be integrated into existing reasoning pipelines without additional training.

## Background: Test-Time Dilemma of Large Model Reasoning

Large Reasoning Models (LRMs) excel at complex tasks like mathematical reasoning and code generation, but their performance is extremely unstable when facing difficult instances. To improve reliability, researchers have developed test-time expansion strategies such as repeated sampling, self-correction, and tree search. While these can boost accuracy, they incur significant computational overhead, and the marginal gain for difficult problems diminishes. The core issue is that existing methods apply the same strategy to all instances without adapting to their difficulty.

## Core Insight: Disagreement Degree is a Key Signal for Difficulty and Correctness

The study found that the disagreement degree of model outputs is strongly correlated with instance difficulty and prediction correctness:
- Low disagreement (model is confident): Multiple sampling outputs are highly consistent;
- Moderate disagreement: Sampling results have obvious differences, but the correct answer is mostly in the majority;
- High disagreement: Results vary greatly, and even the majority answer may not be correct.
Disagreement degree can serve as a free indicator of instance difficulty, which can be estimated through a small number of samples without additional computation.

## Strategy Routing Framework: Dynamically Selecting Optimal Computational Strategies

The framework dynamically selects strategies based on disagreement degree:
1. **Lightweight Parsing**: For low disagreement, take the first or first few sampling results with almost no additional cost;
2. **Majority Voting**: For moderate disagreement, generate multiple samples and select the most common answer to filter out occasional errors;
3. **Rewrite and Reconstruction**: For high disagreement, change the presentation by restating the problem, decomposing subproblems, etc., to provide new reasoning entry points.

## Implementation Advantages: Training Freedom and Modular Design

The framework has the feature of training freedom—no additional model training/fine-tuning is required, and it can be seamlessly integrated into existing LRM reasoning pipelines. Implementation process:
1. Initial sampling (3-5 times) to obtain candidate outputs;
2. Calculate disagreement degree (string matching, semantic similarity, etc.);
3. Route strategies according to thresholds;
4. Output results.
The modular design supports adjusting parameters such as thresholds, sampling times, and rewriting strategies.

## Experimental Validation: Accuracy and Efficiency Improvements on Mathematical Benchmarks

Validated on seven mathematical reasoning benchmarks (including arithmetic, algebra, etc.) and three LRM models:
- Average accuracy improvement of 3%-7%, statistically significant and consistent across models;
- Reduced sampling costs, avoiding computational waste on simple problems and ineffective searches on difficult ones;
- Strategy distribution varies by dataset: lightweight parsing accounts for a high proportion in simple datasets, while majority voting is dominant in competition-level datasets.

## Technical Implications and Future Research Directions

The study brings three implications:
1. Disagreement as a meta-signal can guide resource allocation and be extended to scenarios such as active learning and uncertainty quantification;
2. Strategy diversity is important—future research can explore human-machine collaboration strategies like external tool calling and multimodal reasoning;
3. Rewriting strategies deserve attention—systematic and automated problem rewriting can improve reasoning effects.

## Limitations and Open Problems

The current framework has limitations:
- Disagreement degree thresholds rely on heuristics/grid search, lacking theoretical guidance;
- Rewriting strategies are based on templates/rules and need more intelligent methods;
- Experiments are limited to mathematical tasks, and the effectiveness across domains (code generation, common sense reasoning) remains to be verified.
Future work needs to optimize threshold selection, intelligent rewriting, and cross-domain adaptation.