# DRG: A Training-Free Finalization Recovery Method for Reasoning Models Under Strict Token Constraints

> Detect-Restart-Gate (DRG) is a training-free method that detects pathological signals (repetition, excessive length, stagnation) in reasoning model outputs, triggers a retry mechanism, and intelligently gates answer selection, significantly improving accuracy in mathematical reasoning tasks under strict token budgets.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-24T18:11:46.000Z
- 最近活动: 2026-05-24T18:17:59.141Z
- 热度: 156.9
- 关键词: 推理模型, Token限制, 免训练方法, 自我一致性, 数学推理, 贪婪解码, 采样重试, 门控机制, DeepSeek-R1, Qwen3, Ministral
- 页面链接: https://www.zingnex.cn/en/forum/thread/drg-token
- Canonical: https://www.zingnex.cn/forum/thread/drg-token
- Markdown 来源: floors_fallback

---

## DRG Method Introduction: Training-Free Solution to Output Quality Issues of Reasoning Models Under Token Constraints

Detect-Restart-Gate (DRG) is a training-free method designed to address output quality issues of reasoning models under strict token budgets. By detecting pathological signals (repetition, excessive length, stagnation) during reasoning, it triggers a retry mechanism and intelligently gates answer selection, significantly improving accuracy in mathematical reasoning tasks. This method was released by AnonymousAuthor0211 on GitHub on May 24, 2026 (Project link: https://github.com/AnonymousAuthor0211/detect-restart-gate).

## Background: Token Budget Bottlenecks Faced by Reasoning Models and Limitations of Traditional Solutions

In recent years, LLMs (such as DeepSeek-R1, Qwen3, Ministral) have demonstrated strong reasoning capabilities through chain-of-thought, but verbose outputs often lead to 'unfinished' results in scenarios with limited token budgets. Traditional solutions like supervised fine-tuning require significant resources, while self-consistency methods have high reasoning costs—both have shortcomings.

## Detailed Explanation of DRG's Three-Stage Mechanism: Detect-Retry-Gate

DRG operates through a three-stage mechanism:
1. **Detection**: Generate baseline output via greedy decoding, and parallelly detect three types of pathological signals: repetition (>0.7), length (>P85), and stagnation (no new terms for 4 consecutive lines);
2. **Retry**: When triggered, retry with sampling strategy (temperature=0.7, top_p=0.95), with prompts including the original problem and the last 1200 characters of the baseline;
3. **Gate**: Decide based on the count of pathological signals. If the retry result is consistent with the baseline, accept the baseline; if highly pathological and the retry result can be extracted, accept the retry; otherwise, fall back to SC-2 (select answer from two samples).

## DRG Experimental Design and Implementation Details: Reproducible Framework and Support Capabilities

DRG provides a reproducible experimental framework:
- **Multi-GPU Support**: Data sharding (parallel processing for large datasets) and model sharding (memory optimization for large models);
- **Datasets and Models**: Supports mathematical reasoning datasets like MATH-500 and AIME2024, compatible with models such as Qwen3 and distilled versions of DeepSeek-R1;
- **Answer Extraction and Scoring**: Strip thinking content, extract \boxed{} expressions, and score via string normalization and sympy symbolic verification.

## DRG Technical Highlights: Value of Zero Training Cost and Intelligent Strategy Design

DRG's innovations include:
1. **Zero Training Cost**: No parameter updates needed, can be applied to off-the-shelf models immediately;
2. **Fine-Grained Detection**: Capture output quality issues from multiple dimensions;
3. **Cost-Quality Tradeoff**: Hierarchical strategy (greedy → sampling → SC-2) controls additional overhead;
4. **Interpretable Path**: Record decision paths for easy diagnosis of failure modes.

## Limitations of DRG and Future Research Directions

DRG has limitations:
- **Threshold Sensitivity**: Trigger thresholds need to be calibrated for different models/datasets;
- **Domain Specificity**: Currently adapted for mathematical reasoning, migration requires adjustments;
- **Sampling Randomness**: May lead to decreased retry quality.
Future directions: Explore learning-based triggers, integrate acceleration technologies, and verify effectiveness on large-scale models.

## Conclusion: Practical Value of DRG for Reasoning Model Deployment

DRG provides a practical solution for reasoning model deployment in resource-constrained scenarios, proving that output quality can be improved without modifying model parameters. Its detailed code and documentation lay the foundation for reproduction and expansion, and future training-free optimization methods will play an important role in real-world deployments.
