Section 01
[Introduction] AgentWatcher: A Scalable and Interpretable Monitoring System for Solving Prompt Injection Attacks
AgentWatcher is a monitoring system for prompt injection attacks. It focuses on key context segments through causal attribution, combines explicit rule-based reasoning, and achieves scalable and interpretable detection in long-context scenarios, effectively balancing security and practicality. This article will introduce it from aspects such as background, methods, and experimental verification.