Zing Forum

Reading

STACK: An Efficient Reasoning Compression Framework for Large Reasoning Models to "Think Less, Do More"

This article introduces the STACK framework, which reduces reasoning length by 59.9% while maintaining or even improving accuracy through state-aware reasoning compression and knowledge guidance. This method dynamically identifies redundant reasoning steps and combines PPO and DPO training strategies, opening up a new path for efficiency optimization of large reasoning models.

大推理模型思维链压缩高效推理PPODPO检索增强过度思考机器学习
Published 2026-04-10 17:31Recent activity 2026-04-13 09:53Estimated read 8 min
STACK: An Efficient Reasoning Compression Framework for Large Reasoning Models to "Think Less, Do More"
1

Section 01

Introduction to the STACK Framework: A New Path for Efficient Reasoning of Large Reasoning Models

Large reasoning models (e.g., OpenAI o1, DeepSeek-R1) rely on lengthy thought chains to achieve breakthroughs in complex tasks, but overthinking leads to high computational costs, reasoning delays, and decreased accuracy. The STACK framework, through state-aware reasoning compression and knowledge guidance, reduces reasoning length by 59.9% while increasing accuracy by 4.8 percentage points across three mathematical reasoning benchmarks, opening a new path for efficiency optimization of large models.

2

Section 02

Background: Overthinking Problems of Large Reasoning Models and Limitations of Existing Compression Methods

Overthinking Phenomena

  1. Redundant Verification Loops: After reaching an initial conclusion, repeatedly verifying the same step, generating a large number of tokens with no new information;
  2. Self-Correction Quagmire: Falling into a cycle of doubt and correction, which may eventually lead to wrong answers;
  3. Irrelevant Knowledge Proliferation: Calling on background knowledge unrelated to the problem, wasting resources and introducing interference.

Limitations of Existing Compression Methods

  • Coarse-grained Compression: Lacks fine-grained analysis, easily deletes key steps or retains redundancy;
  • Static Strategies: Fixed rules cannot adapt to dynamic reasoning stages;
  • Trade-off Dilemma: Aggressive compression sacrifices accuracy, while conservative compression fails to solve the root problem.
3

Section 03

Core Design of the STACK Framework: State Awareness and Dynamic Compression Mechanism

STACK solves the problem through three innovations:

  1. State Awareness: Dynamically identifies two redundant states—uncertain/biased state (requires external knowledge guidance) and overconfident long reasoning state (can be self-compressed);
  2. Dual Compression Mechanism:
    • Knowledge-guided Compression: Retrieves external knowledge bases to correct biases, provide compression references, and enhance confidence;
    • Self-prompt Compression: Guides the model to identify repeated steps and generate concise equivalent reasoning;
  3. Early Stopping on Answer Convergence: Terminates reasoning when the answer remains the same for N consecutive steps and confidence is stable, suppressing redundant verification.
4

Section 04

Training Strategy: Hybrid Training with PPO and DPO Collaboration

Online Comparative Sample Construction

Generate long versions (free thought chains) and short versions (compressed reasoning) for each problem as preference pairs.

Hybrid Training Objectives

  • PPO Component: Optimizes the policy network to stably select compression actions;
  • DPO Component: Uses preference signals to train concise reasoning generation;
  • Reward Function: Includes accuracy rewards (positive for correct answers/negative for wrong answers) and efficiency rewards (higher for shorter lengths, with an over-compression threshold set).
5

Section 05

Experimental Validation: Win-Win of Efficiency and Accuracy

Benchmark Settings

Tested on three benchmarks: GSM8K (elementary school math), MATH (high school competition), and OlympiadBench (Olympiad-level difficult problems).

Core Results

  • Reasoning length reduced by 59.9%, with over 70% compression for some simple problems;
  • Accuracy increased by 4.8 percentage points, proving that overthinking impairs performance;
  • Cross-model Consistency: Applicable to models like Llama, Qwen, GPT-4, etc.

Ablation Experiments

  • Removing state awareness leads to a significant performance drop;
  • Knowledge guidance + self-prompt achieves the best effect;
  • Early stopping mechanism saves computation and improves accuracy simultaneously;
  • Hybrid training is better than pure PPO or pure DPO.
6

Section 06

Application Prospects: Implications for Deployment and Research

Deployment Significance

  • Cost Reduction: Halving reasoning length lowers computational costs;
  • Experience Improvement: Lower latency improves real-time interaction scenarios;
  • Environmental Protection: Reduces energy consumption and carbon emissions.

Research Implications

  • Efficiency and capability can be achieved simultaneously; intelligence needs to "know when to stop";
  • Metacognitive ability (self-state awareness) is an improvement direction;
  • RAG technology can be used to optimize the reasoning process.
7

Section 07

Limitations and Future Work

Limitations

  • Domain Generalization: Only verified on mathematical reasoning; needs to be extended to creative writing, dialogue, etc.;
  • Knowledge Base Dependence: The effect of knowledge guidance is affected by the quality of external knowledge bases;
  • Compression Limit: Accuracy drops beyond the threshold; need to determine the optimal ratio;
  • Interpretability: The logic of compression decisions is not transparent enough.

Future Directions

Explore cross-domain adaptation, optimize knowledge base dependence, study compression limits, and improve interpretability.