# STACK: An Efficient Reasoning Compression Framework for Large Reasoning Models to "Think Less, Do More"

> This article introduces the STACK framework, which reduces reasoning length by 59.9% while maintaining or even improving accuracy through state-aware reasoning compression and knowledge guidance. This method dynamically identifies redundant reasoning steps and combines PPO and DPO training strategies, opening up a new path for efficiency optimization of large reasoning models.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-10T09:31:41.000Z
- 最近活动: 2026-04-13T01:53:11.033Z
- 热度: 86.6
- 关键词: 大推理模型, 思维链压缩, 高效推理, PPO, DPO, 检索增强, 过度思考, 机器学习
- 页面链接: https://www.zingnex.cn/en/forum/thread/stack
- Canonical: https://www.zingnex.cn/forum/thread/stack
- Markdown 来源: floors_fallback

---

## Introduction to the STACK Framework: A New Path for Efficient Reasoning of Large Reasoning Models

Large reasoning models (e.g., OpenAI o1, DeepSeek-R1) rely on lengthy thought chains to achieve breakthroughs in complex tasks, but overthinking leads to high computational costs, reasoning delays, and decreased accuracy. The STACK framework, through state-aware reasoning compression and knowledge guidance, reduces reasoning length by 59.9% while increasing accuracy by 4.8 percentage points across three mathematical reasoning benchmarks, opening a new path for efficiency optimization of large models.

## Background: Overthinking Problems of Large Reasoning Models and Limitations of Existing Compression Methods

### Overthinking Phenomena
1. **Redundant Verification Loops**: After reaching an initial conclusion, repeatedly verifying the same step, generating a large number of tokens with no new information;
2. **Self-Correction Quagmire**: Falling into a cycle of doubt and correction, which may eventually lead to wrong answers;
3. **Irrelevant Knowledge Proliferation**: Calling on background knowledge unrelated to the problem, wasting resources and introducing interference.

### Limitations of Existing Compression Methods
- **Coarse-grained Compression**: Lacks fine-grained analysis, easily deletes key steps or retains redundancy;
- **Static Strategies**: Fixed rules cannot adapt to dynamic reasoning stages;
- **Trade-off Dilemma**: Aggressive compression sacrifices accuracy, while conservative compression fails to solve the root problem.

## Core Design of the STACK Framework: State Awareness and Dynamic Compression Mechanism

STACK solves the problem through three innovations:
1. **State Awareness**: Dynamically identifies two redundant states—uncertain/biased state (requires external knowledge guidance) and overconfident long reasoning state (can be self-compressed);
2. **Dual Compression Mechanism**:
   - **Knowledge-guided Compression**: Retrieves external knowledge bases to correct biases, provide compression references, and enhance confidence;
   - **Self-prompt Compression**: Guides the model to identify repeated steps and generate concise equivalent reasoning;
3. **Early Stopping on Answer Convergence**: Terminates reasoning when the answer remains the same for N consecutive steps and confidence is stable, suppressing redundant verification.

## Training Strategy: Hybrid Training with PPO and DPO Collaboration

### Online Comparative Sample Construction
Generate long versions (free thought chains) and short versions (compressed reasoning) for each problem as preference pairs.

### Hybrid Training Objectives
- **PPO Component**: Optimizes the policy network to stably select compression actions;
- **DPO Component**: Uses preference signals to train concise reasoning generation;
- **Reward Function**: Includes accuracy rewards (positive for correct answers/negative for wrong answers) and efficiency rewards (higher for shorter lengths, with an over-compression threshold set).

## Experimental Validation: Win-Win of Efficiency and Accuracy

### Benchmark Settings
Tested on three benchmarks: GSM8K (elementary school math), MATH (high school competition), and OlympiadBench (Olympiad-level difficult problems).

### Core Results
- **Reasoning length reduced by 59.9%**, with over 70% compression for some simple problems;
- **Accuracy increased by 4.8 percentage points**, proving that overthinking impairs performance;
- **Cross-model Consistency**: Applicable to models like Llama, Qwen, GPT-4, etc.

### Ablation Experiments
- Removing state awareness leads to a significant performance drop;
- Knowledge guidance + self-prompt achieves the best effect;
- Early stopping mechanism saves computation and improves accuracy simultaneously;
- Hybrid training is better than pure PPO or pure DPO.

## Application Prospects: Implications for Deployment and Research

### Deployment Significance
- **Cost Reduction**: Halving reasoning length lowers computational costs;
- **Experience Improvement**: Lower latency improves real-time interaction scenarios;
- **Environmental Protection**: Reduces energy consumption and carbon emissions.

### Research Implications
- Efficiency and capability can be achieved simultaneously; intelligence needs to "know when to stop";
- Metacognitive ability (self-state awareness) is an improvement direction;
- RAG technology can be used to optimize the reasoning process.

## Limitations and Future Work

### Limitations
- **Domain Generalization**: Only verified on mathematical reasoning; needs to be extended to creative writing, dialogue, etc.;
- **Knowledge Base Dependence**: The effect of knowledge guidance is affected by the quality of external knowledge bases;
- **Compression Limit**: Accuracy drops beyond the threshold; need to determine the optimal ratio;
- **Interpretability**: The logic of compression decisions is not transparent enough.

### Future Directions
Explore cross-domain adaptation, optimize knowledge base dependence, study compression limits, and improve interpretability.