# Detection-Extraction Gap: Large Models Already Know the Answer But Can't Output It

> This article reveals the "Detection-Extraction Gap" phenomenon in large reasoning models: models determine the answer early in the chain of thought, but forced decoding fails to extract it; the proposed BAEE method can truncate 70-78% of generation and improve accuracy by 1-5 percentage points (pp).

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-08T02:47:31.000Z
- 最近活动: 2026-04-09T02:12:28.506Z
- 热度: 127.6
- 关键词: 大语言模型, 推理优化, 早期退出, 思维链, 检测-提取鸿沟, BAEE, 推理效率, 解码策略
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2604-06613v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2604-06613v1
- Markdown 来源: floors_fallback

---

## [Main Floor] Detection-Extraction Gap: Large Models Already Know the Answer But Struggle to Output It; BAEE Method Enables Efficient Reasoning

This article reveals the existence of the "Detection-Extraction Gap" phenomenon in large reasoning models: models determine the answer early in the chain of thought, but standard decoding fails to extract it; the proposed BAEE method can truncate 70-78% of generation and improve accuracy by 1-5 pp. This discussion will be divided into floors covering background, evidence, method, results, etc.

## [Background] What is the Detection-Extraction Gap?

When large models generate chains of thought, they often exhibit the phenomenon of "continuing to generate redundant content after figuring out the answer". The research team named this the "Detection-Extraction Gap":
- **Detection**: Through internal states or free continuation, it can be determined that the model already "knows" the answer early in the chain of thought;
- **Extraction**: Standard prompt-conditioned decoding (forced extraction) often fails.
In short, the model has internally determined the answer, but standard methods cannot effectively obtain it.

## [Evidence] Experimental Data Verifies the Existence of the Gap

Experimental data supports the existence of the gap:
1. Analysis of 5 model configurations, 2 families, and 3 benchmarks found that 52%-88% of chain-of-thought tokens are redundant content generated after the answer is determined;
2. Truncating the first 10% prefix of the chain of thought, free continuation can recover the correct answer, but forced extraction (e.g., asking "Based on the above reasoning, what is the answer?") has a failure rate of up to 42%;
3. Theoretically, total variation boundary analysis shows that the conditional constraints of forced extraction change the output distribution, interrupt the natural reasoning trajectory, and lead to failure.

## [Method] BAEE: Black-box Adaptive Early Exit Strategy

BAEE (Black-box Adaptive Early Exit) is a black-box efficient reasoning method that leverages the gap. Its core steps are:
1. **Detect Answer Readiness**: During generation, periodically use lightweight free continuation tests to determine whether the model is ready to output the answer;
2. **Extract and Terminate**: Once readiness is detected, extract the answer via free continuation and stop generation immediately to avoid redundant content.

## [Results] BAEE Brings Significant Efficiency and Performance Improvements

BAEE has significant effects:
- **Generation Truncation Rate**: 70%-78%, greatly reducing redundant tokens;
- **Accuracy Improvement**: 1-5 pp on all tested models, with explicit thinking mode models (e.g., DeepSeek-R1) reaching up to 5.8 pp;
- **Cost Optimization**: Variants only require a median of 9 API calls, achieving 52%-73% truncation and balancing cost and efficiency.

## [Implications and Applications] Value for Model Design and Practical Scenarios

Implications and Applications:
**Model Design**:
- Reconsider the role of chain of thought: Longer chains are not necessarily deeper; redundant tokens may be a sign of inability to stop in time;
- Optimize decoding strategies: Need smarter strategies to identify answer readiness states;
- Adjust training objectives: Introduce early exit objectives to enable models to organize reasoning more efficiently.
**Practical Applications**:
- Reduce API costs (cut token consumption by over 70%);
- Reduce response latency and improve real-time interaction experience;
- Avoid lengthy reasoning displays and optimize user experience.

## [Limitations and Outlook] Future Research Directions

Limitations and Future Directions:
**Limitations**:
- Detection frequency and timing need further optimization;
- Some tasks (e.g., multi-step mathematical proofs) require more cautious early exit strategies;
**Future**:
- Study optimal detection points to balance overhead and exit opportunities;
- Explore applicability to different task types;
- Combine model internal states (white-box methods) to improve detection accuracy.