# SmartThinker: Efficient Large Language Model Reasoning via Progressive Chain-of-Thought Length Calibration

> The SmartThinker method proposed by the Shanghai Jiao Tong University team dynamically estimates the optimal reasoning length and adjusts reward coefficients, achieving up to 52.6% reasoning length compression while maintaining accuracy, and a 16.6% relative accuracy improvement on challenging tasks like AIME25.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-05T18:42:37.000Z
- 最近活动: 2026-06-05T18:48:46.323Z
- 热度: 157.9
- 关键词: 大语言模型, 思维链, 推理效率, GRPO, 模型压缩, ICML 2026, 上海交通大学
- 页面链接: https://www.zingnex.cn/en/forum/thread/smartthinker-f2e7d9df
- Canonical: https://www.zingnex.cn/forum/thread/smartthinker-f2e7d9df
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: SmartThinker: Efficient Large Language Model Reasoning via Progressive Chain-of-Thought Length Calibration

The SmartThinker method proposed by the Shanghai Jiao Tong University team dynamically estimates the optimal reasoning length and adjusts reward coefficients, achieving up to 52.6% reasoning length compression while maintaining accuracy, and a 16.6% relative accuracy improvement on challenging tasks like AIME25.

## Original Authors and Sources

- **Original Authors/Maintainers**: SJTU-RTEAS (Shanghai Jiao Tong University Real-Time Embedded Systems and Intelligent Computing Lab)
- **Source Platform**: GitHub
- **Original Title**: SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model Reasoning
- **Original Link**: https://github.com/SJTU-RTEAS/SmartThinker
- **Paper Link**: https://arxiv.org/abs/2603.08000
- **Publication Time**: March 2026 (Accepted by ICML 2026)

---

## Background: The Dilemma of Long Chain-of-Thought

In recent years, large reasoning models (LRMs) represented by OpenAI o1 and DeepSeek-R1 have achieved remarkable results in complex tasks such as mathematical reasoning and code generation. The core secret of these models lies in the adoption of the **long Chain-of-Thought (CoT)** reasoning mechanism—they generate a detailed internal thinking process before giving the final answer.

However, this "deliberative" approach also comes with obvious costs: **the reasoning process is extremely lengthy**. Models often generate a large number of redundant thinking steps, leading to the so-called "overthinking" phenomenon. This not only increases reasoning latency and computational costs but may also cause models to "overthink" simple problems, thereby reducing efficiency.

Existing solutions mostly use the GRPO (Group Relative Policy Optimization) algorithm to compress output length, but these methods adopt **static length reward design**, which cannot adaptively adjust according to problem difficulty and response length distribution. As a result, over-compression often leads to decreased accuracy, or insufficient compression leads to limited efficiency improvement.

---

## Core Idea of SmartThinker

To address the above issues, the Shanghai Jiao Tong University team proposed **SmartThinker**—a new efficient reasoning method based on GRPO, which achieves intelligent compression through **progressive Chain-of-Thought length calibration**. The core innovations of this method can be summarized into two points:

## 1. Dynamic Optimal Length Estimation and Guidance

During training, SmartThinker dynamically estimates the **optimal reasoning length** for each type of problem—the length at which the model can achieve peak accuracy. For overly long responses, the system guides them toward this optimal length, thereby reducing reasoning length while maintaining accuracy.

This dynamic estimation is not a preset fixed value but is continuously adjusted based on actual performance during training, so it can adapt to problems of different difficulty levels.

## 2. Dynamic Length Reward Coefficient Adjustment

Traditional length penalty methods often treat all cases equally, which may impose unnecessary penalties on correct reasoning paths. SmartThinker introduces a **dynamic length reward coefficient** mechanism that can identify and avoid improper penalties on correct reasoning paths, ensuring that the model does not sacrifice reasoning quality in pursuit of short outputs.

---

## Experimental Results: Balancing Efficiency and Accuracy

The research team verified the effect of SmartThinker on multiple challenging benchmark tests, and the results are impressive:

- **Reasoning Length Compression**: Up to **52.6%** length compression rate, significantly reducing reasoning costs
- **Accuracy Improvement**: A **16.6%** relative accuracy improvement on high-difficulty mathematical reasoning benchmarks like AIME25
- **Dual Benefits**: Unlike methods that only pursue compression, SmartThinker **improves** accuracy while shortening reasoning length, achieving a true win-win situation

These results indicate that SmartThinker has successfully solved the trade-off dilemma between "compression vs. accuracy" and provides new ideas for the development of efficient reasoning models.

---

## Technical Implementation and Open-Source Resources

The SmartThinker project provides a complete technical implementation, including:
