Section 01
Introduction: PUMA—A Semantic-Preserving Early Exit Mechanism for Reasoning Models
PUMA is a semantic-preserving early exit mechanism for reasoning models. It determines the convergence timing by detecting semantic redundancy in the reasoning chain. While maintaining answer accuracy and reasoning chain integrity, it reduces token generation by an average of 26.2%, significantly improving the efficiency of reasoning models. This mechanism addresses the "overthinking" problem of Large Reasoning Models (LRMs) and provides a new perspective for efficient reasoning.