Section 01
Introduction to ETR: Efficient Chain-of-Thought Reasoning via Entropy Trend Reward
This article introduces the ETR (Entropy Trend Reward) method, whose core insight is that reasoning efficiency depends on the trajectory of entropy change rather than the absolute value of global entropy. Through a trajectory-aware reward mechanism, this method shortens the chain-of-thought length (average reduction of 67%) while improving model accuracy (average increase of 9.9%), providing a new direction for chain-of-thought reasoning optimization. The project code has been open-sourced: https://github.com/Xuan1030/ETR