Reading

SmartThinker: Efficient Large Language Model Reasoning via Progressive Chain-of-Thought Length Calibration

The SmartThinker method proposed by the Shanghai Jiao Tong University team dynamically estimates the optimal reasoning length and adjusts reward coefficients, achieving up to 52.6% reasoning length compression while maintaining accuracy, and a 16.6% relative accuracy improvement on challenging tasks like AIME25.

大语言模型思维链推理效率GRPO模型压缩ICML 2026上海交通大学

Published 2026-06-06 02:42Recent activity 2026-06-06 02:48Estimated read 7 min

Section 01

Introduction / Main Floor: SmartThinker: Efficient Large Language Model Reasoning via Progressive Chain-of-Thought Length Calibration

Section 02

Original Authors and Sources

Original Authors/Maintainers: SJTU-RTEAS (Shanghai Jiao Tong University Real-Time Embedded Systems and Intelligent Computing Lab)
Source Platform: GitHub
Original Title: SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model Reasoning
Original Link: https://github.com/SJTU-RTEAS/SmartThinker
Paper Link: https://arxiv.org/abs/2603.08000
Publication Time: March 2026 (Accepted by ICML 2026)

Section 03

Background: The Dilemma of Long Chain-of-Thought

In recent years, large reasoning models (LRMs) represented by OpenAI o1 and DeepSeek-R1 have achieved remarkable results in complex tasks such as mathematical reasoning and code generation. The core secret of these models lies in the adoption of the long Chain-of-Thought (CoT) reasoning mechanism—they generate a detailed internal thinking process before giving the final answer.

However, this "deliberative" approach also comes with obvious costs: the reasoning process is extremely lengthy. Models often generate a large number of redundant thinking steps, leading to the so-called "overthinking" phenomenon. This not only increases reasoning latency and computational costs but may also cause models to "overthink" simple problems, thereby reducing efficiency.

Existing solutions mostly use the GRPO (Group Relative Policy Optimization) algorithm to compress output length, but these methods adopt static length reward design, which cannot adaptively adjust according to problem difficulty and response length distribution. As a result, over-compression often leads to decreased accuracy, or insufficient compression leads to limited efficiency improvement.

Section 04

Core Idea of SmartThinker

To address the above issues, the Shanghai Jiao Tong University team proposed SmartThinker—a new efficient reasoning method based on GRPO, which achieves intelligent compression through progressive Chain-of-Thought length calibration. The core innovations of this method can be summarized into two points:

Section 05

1. Dynamic Optimal Length Estimation and Guidance

During training, SmartThinker dynamically estimates the optimal reasoning length for each type of problem—the length at which the model can achieve peak accuracy. For overly long responses, the system guides them toward this optimal length, thereby reducing reasoning length while maintaining accuracy.

This dynamic estimation is not a preset fixed value but is continuously adjusted based on actual performance during training, so it can adapt to problems of different difficulty levels.

Section 06

2. Dynamic Length Reward Coefficient Adjustment

Traditional length penalty methods often treat all cases equally, which may impose unnecessary penalties on correct reasoning paths. SmartThinker introduces a dynamic length reward coefficient mechanism that can identify and avoid improper penalties on correct reasoning paths, ensuring that the model does not sacrifice reasoning quality in pursuit of short outputs.

Section 07

Experimental Results: Balancing Efficiency and Accuracy

The research team verified the effect of SmartThinker on multiple challenging benchmark tests, and the results are impressive:

Reasoning Length Compression: Up to 52.6% length compression rate, significantly reducing reasoning costs
Accuracy Improvement: A 16.6% relative accuracy improvement on high-difficulty mathematical reasoning benchmarks like AIME25
Dual Benefits: Unlike methods that only pursue compression, SmartThinker improves accuracy while shortening reasoning length, achieving a true win-win situation

These results indicate that SmartThinker has successfully solved the trade-off dilemma between "compression vs. accuracy" and provides new ideas for the development of efficient reasoning models.

Section 08

Technical Implementation and Open-Source Resources

The SmartThinker project provides a complete technical implementation, including:

SmartThinker: Efficient Large Language Model Reasoning via Progressive Chain-of-Thought Length Calibration

Introduction / Main Floor: SmartThinker: Efficient Large Language Model Reasoning via Progressive Chain-of-Thought Length Calibration

Original Authors and Sources

Background: The Dilemma of Long Chain-of-Thought

Core Idea of SmartThinker

1. Dynamic Optimal Length Estimation and Guidance

2. Dynamic Length Reward Coefficient Adjustment

Experimental Results: Balancing Efficiency and Accuracy

Technical Implementation and Open-Source Resources

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

Graph Neural Networks Revolutionize Global Weather Forecasting: From Graph Weather to Open-Source Practice of Multi-Model Fusion

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Vertica Expert Skills: A One-Stop Guide to Enterprise Database Migration and Optimization