Zing Forum

Reading

RecurGuard: A Novel Security Mechanism for Real-Time Defense Against Reasoning Token Consumption Attacks

Researchers propose the RecurGuard runtime monitoring framework, which effectively detects reasoning consumption attacks such as OverThink and ExtendAttack by analyzing three signals of the reasoning trajectory: recursion rate, volume growth, and task progress. It achieves a 99% detection rate for OverThink attacks while maintaining a near-zero false positive rate.

AI安全提示注入攻击推理模型运行时监控Token消耗拒绝服务DeepSeek大语言模型安全
Published 2026-06-06 11:52Recent activity 2026-06-09 10:23Estimated read 6 min
RecurGuard: A Novel Security Mechanism for Real-Time Defense Against Reasoning Token Consumption Attacks
1

Section 01

RecurGuard: A Novel Security Mechanism for Real-Time Defense Against Reasoning Token Consumption Attacks (Introduction)

Researchers propose the RecurGuard runtime monitoring framework, which targets reasoning token consumption attacks (such as OverThink and ExtendAttack) and detects them by real-time analysis of three signals from the reasoning trajectory: recursion rate, volume growth, and task progress. This mechanism achieves a 99% detection rate for OverThink attacks while maintaining a near-zero false positive rate, and can terminate the generation process early to prevent further token consumption.

2

Section 02

Attack Background: Threats of Reasoning Token Consumption Attacks and Failure of Traditional Defenses

Reasoning token consumption attacks target models with reasoning capabilities (e.g., DeepSeek-R1, OpenAI o-series). They induce models to waste generation budgets on decoy tasks via prompt injection, causing dual harms: denial of service (failure to produce a final answer) and wallet denial (increased token billing costs). Traditional input-side security classifiers struggle to detect such attacks because injected prompts appear syntactically harmless, with malicious intent hidden in reasonable task descriptions.

3

Section 03

RecurGuard Framework Design and Three Core Detection Signals

RecurGuard is based on the assumption of reasoning trajectory visibility (mainstream reasoning models output thinking processes) and tracks three complementary signals:

  1. Recursion rate: Detects abnormal loops or repeated thinking in reasoning;
  2. Volume growth: Monitors whether the number of reasoning tokens far exceeds the normal baseline;
  3. Task progress: Evaluates whether reasoning is moving toward the user's original query. Defense is triggered only when all three signals are abnormal for consecutive reasoning blocks.
4

Section 04

Detection Logic and Experimental Evaluation Results

RecurGuard adopts a three-signal joint decision strategy: an attack is determined and early termination is triggered only when all signals are abnormal in three consecutive reasoning blocks. Experimental results show: 99% detection rate for OverThink attacks, 92% for ExtendAttack, and near-zero false positives. In adaptive stress tests, the missed detection rate for topic-related attacks was 50%, and the token amplification rate for fully semantic evasion attacks dropped from 22.8x to 2.2x, significantly increasing attack costs.

5

Section 05

Technical Contributions and Practical Significance

The technical contributions of RecurGuard include paradigm innovation (shifting from input-side static detection to runtime dynamic monitoring). Implications for deployment: Reasoning trajectories are security resources; a deep defense system of input filtering, runtime monitoring, and output auditing should be built, considering cost-security trade-offs. Contributions to attack research: Revealing that topic-related attacks are more cost-effective, and reducing attack amplification rate is a substantial security improvement.

6

Section 06

Limitations and Future Research Directions

Current limitations: Dependence on models exposing reasoning trajectories (black-box models require the less effective QDM degradation scheme); 50% missed detection rate for adaptive attacks; effectiveness in multilingual scenarios remains to be verified. Future directions: Develop finer-grained semantic analysis, online learning adaptive defense, and hardware-level optimization to achieve zero-overhead monitoring.

7

Section 07

Degradation Scheme and Conclusion

In scenarios without reasoning trajectories, researchers propose the QDM degradation scheme (based on final output detection). Conclusion: RecurGuard provides a new dimension for reasoning model security protection. Security defense needs to keep up with the development of model capabilities; runtime monitoring will become a standard component of security architecture to ensure the safe deployment of reasoning models.