Zing Forum

Reading

MARS: A Margin-Adversarial Risk-Controlled Early Stopping Strategy

MARS monitors the aggregated voting dynamics at intermediate checkpoints to predict which reasoning trajectories might change the answer, saving 25-47% of computation tokens while ensuring accuracy.

测试时扩展早停策略推理优化多数投票MARS计算效率LLM推理
Published 2026-06-11 13:56Recent activity 2026-06-12 09:25Estimated read 5 min
MARS: A Margin-Adversarial Risk-Controlled Early Stopping Strategy
1

Section 01

【Introduction】MARS: Margin-Adversarial Risk-Controlled Early Stopping Strategy, Saving 25-47% Computation Tokens Without Accuracy Loss

MARS (Margin-Adversarial Risk-controlled Stopping) is a research result published on arXiv on June 11, 2026. Addressing the computational overhead issue in parallel inference-time expansion, it monitors the aggregated voting dynamics at intermediate checkpoints to predict reasoning trajectories that might change the answer. By adopting a margin-adversarial stopping rule, it saves 25-47% of computation tokens while ensuring accuracy. The core is to separate two types of uncertainties: trajectory-level switching probability and adversarial boundary, enabling risk-controlled early stopping.

2

Section 02

【Background】Computational Dilemma of Parallel Inference-Time Expansion

Inference-time expansion improves LLM reasoning ability by sampling a large number of reasoning trajectories and using majority voting, but all trajectories need to run to completion, leading to huge computational overhead. The research team observed that intermediate checkpoints can extract the current answer, and the aggregated voting pattern evolves as reasoning progresses, raising the question: Can irrelevant trajectories be terminated early while maintaining accuracy?

3

Section 03

【Methodology】Core Ideas and Implementation of MARS

MARS introduces a margin-adversarial stopping rule to estimate the possibility that an active trajectory will change the answer, stopping generation when the leading answer is safe. The key is to separate two types of uncertainties: 1. Trajectory-level switching probability (predicting the probability that a trajectory will change the answer later); 2. Adversarial boundary (conservatively estimating the direction of answer change). In practice, a five-feature logistic regression model (features include voting margin, trajectory confidence, etc.) is used, which has the advantages of low overhead, interpretability, and good generalization.

4

Section 04

【Experiments】Significant Computational Saving Effects

In evaluations using three reasoning models and three competition math benchmarks, MARS performed excellently: it saved 25-47% of tokens compared to standard self-consistency without loss of accuracy; it further saved 14-29% of tokens compared to the advanced baseline DeepConf Online (which already filters weak trajectories), proving the effectiveness and complementarity of the method.

5

Section 05

【Conclusion】Technical Contributions and Theoretical Guarantees

MARS not only has practical effects but also provides a structured analysis framework: separating two sources of uncertainty. Theoretically, when the switching probability is accurate, it is highly probable that the early-stopped answer is consistent with the full voting result. Its risk control feature is suitable for accuracy-sensitive scenarios; the adversarial boundary design considers the worst-case scenario, improving robustness.

6

Section 06

【Applications and Limitations】Applicable Scenarios and Future Directions

MARS is applicable to all parallel inference-time expansion scenarios (mathematical problem solving, code generation, etc.). Limitations: Currently, it targets the majority voting aggregation strategy; other aggregation methods require adjustments; it relies on the accuracy of the switching probability model, and out-of-distribution scenarios need recalibration. It is still an important progress in efficiency optimization.