Zing Forum

Reading

ARGUS: A Defense Mechanism for Context-Aware Prompt Injection Attacks on LLM Agents

The study proposes the AgentLure benchmark and ARGUS defense system, addressing the limitation of existing defenses that ignore context-dependent tasks. By constructing an influence provenance graph to track the propagation of untrusted context, it reduces the attack success rate to 3.8% while maintaining 87.5% task utility, significantly outperforming existing defense methods.

LLM智能体提示注入攻击安全防护溯源追踪上下文感知Agent安全决策审计对抗鲁棒性
Published 2026-05-05 13:37Recent activity 2026-05-06 10:36Estimated read 5 min
ARGUS: A Defense Mechanism for Context-Aware Prompt Injection Attacks on LLM Agents
1

Section 01

ARGUS: A Guide to the Defense Mechanism Against Context-Aware Prompt Injection on LLM Agents

This paper proposes the AgentLure benchmark and ARGUS defense system, addressing the limitation of existing defenses that ignore context-dependent tasks. By constructing an influence provenance graph to track the propagation of untrusted context, it reduces the attack success rate to 3.8% while maintaining 87.5% task utility, significantly outperforming existing defense methods and providing a new path for the security protection of LLM agents.

2

Section 02

Security Challenges of LLM Agents and Shortcomings of Existing Defenses

The expanded capabilities of LLM agents bring the risk of prompt injection attacks: attackers embed malicious instructions to induce unintended operations. Most existing studies assume context independence, which is far from real-world scenarios (dynamic dependencies, adaptive attacks, multi-step execution). Existing defenses (input filtering, prompt separation, instruction hierarchy, output monitoring) fail due to being stateless and not tracking information propagation.

3

Section 03

AgentLure: A Benchmark for Evaluating Context-Aware Attacks

AgentLure is the first evaluation framework for context-dependent tasks, covering 4 major domains (personal assistant, code assistant, data analysis, web browsing) and 8 attack vectors (direct/indirect/multi-step/disguised/obfuscated/context hijacking/tool return/observation injection). The attacks have context-aware characteristics (based on the agent's existing tasks, disguised as legitimate content, leveraging links in the reasoning chain).

4

Section 04

ARGUS Defense Mechanism: Provenance-Aware Decision Auditing

The core of ARGUS is the influence provenance graph, which records information sources, trust levels, propagation paths, and decision dependencies. Three-stage process: 1. Context tagging and tracking (attaching provenance metadata); 2. Decision impact analysis (identifying dependent information, constructing paths, calculating weighted impact); 3. Evidence verification and arbitration (executing based on trusted evidence or triggering protection). Technical highlights: lightweight integration, adaptive trust model, adversarial robustness.

5

Section 05

Experimental Evaluation: Defense Effectiveness and Robustness of ARGUS

AgentLure tests show: ARGUS reduces the attack success rate to 3.8% (baseline 52.3%) while maintaining 87.5% task utility, significantly outperforming existing defenses. Under adaptive white-box attacks, the success rate remains below 8.2%, and ablation experiments verify that the synergy between the provenance and verification components is critical.

6

Section 06

Conclusions and Future Research Directions

ARGUS balances security and usability, providing a feasible path for trustworthy agent systems. Limitations include computational overhead, complexity of long reasoning chains, and response to new attacks. Future directions: efficient provenance algorithms, machine learning for identifying suspicious patterns, multi-agent security mechanisms, and standardized evaluation frameworks.