Zing Forum

Reading

SkillHarm: Lifecycle Security Assessment and Automated Attack Construction for Agent Skills

This paper proposes the SkillHarm benchmark to systematically evaluate the security risks of agent skills throughout their full lifecycle. Through two attack scenarios—Fixed Payload Poisoning (FPP) and Self-Mutating Poisoning (SMP)—the study identifies 12 risk types, with current agents having an attack success rate as high as 86.3%, revealing severe security vulnerabilities in the skill ecosystem.

智能体安全技能投毒AI安全攻击基准生命周期安全LLM智能体
Published 2026-06-02 01:45Recent activity 2026-06-02 12:55Estimated read 6 min
SkillHarm: Lifecycle Security Assessment and Automated Attack Construction for Agent Skills
1

Section 01

Introduction: SkillHarm Reveals Severe Security Vulnerabilities in the Agent Skill Ecosystem

This paper proposes the SkillHarm benchmark, the first systematic evaluation of the security risks of agent skills throughout their full lifecycle. Through two attack scenarios—Fixed Payload Poisoning (FPP) and Self-Mutating Poisoning (SMP)—12 risk types are identified, with current agents having an FPP attack success rate as high as 86.3%, revealing severe security vulnerabilities in the skill ecosystem.

2

Section 02

Background: Skills as Privileged Attack Surfaces for Agents, Limitations in Existing Research

Privileged Characteristics of Skills

  • Implicit trust: Agents automatically discover and execute skills without explicit authorization
  • Persistent state: Saves data across sessions, affecting subsequent interactions
  • System-level access: Requires permissions for sensitive resources (files/databases/APIs)
  • Third-party ecosystem: Open contributions drive innovation but also increase risks

Limitations of Existing Research

  • Single-point evaluation: Ignores cumulative effects of repeated use and cross-session impacts
  • Ad-hoc risk enumeration: Lacks systematic classification, making comparison and integration difficult

Skill Lifecycle

It includes six stages: installation, discovery, initialization, execution, cleanup, and reuse. Understanding the full lifecycle is key to offense and defense.

3

Section 03

Methodology: Two Attack Scenarios + 12 Risk Categories + Automated Construction Tool

Attack Scenarios

  1. Fixed Payload Poisoning (FPP):Malicious payload is fixed and triggered on first invocation, e.g., data theft/system destruction
  2. Self-Mutating Poisoning (SMP):Initially benign; first execution modifies persistent state, and delayed attacks are triggered in subsequent sessions (highly stealthy)

Risk Classification

  • Data pipeline (4 types): Theft/contamination/injection/leakage
  • System environment (4 types): File/network/process abuse, resource exhaustion
  • Agent autonomy (4 types): Behavior manipulation/tool abuse/session hijacking/target tampering

AutoSkillHarm Tool

Through natural language description → code generation → verification → integration, it constructs 879 attack samples covering 71 skill scenarios.

4

Section 04

Experimental Results: Significant Agent Vulnerability, Insufficient Existing Defenses

Attack Success Rate

  • FPP: 86.3% (most fixed attacks succeed)
  • SMP: 69.3% (stealthy delayed attacks still have high success rates)

Hidden Risks

Most seemingly failed attacks are due to agents not invoking skills correctly; the actual defense rate is lower

Limitations of Existing Defenses

  • Static analysis struggles to detect SMP (initial code is benign)
  • Principle of least privilege is hard to practice (legitimate skills require broad permissions)
  • Behavior monitoring has high false positives; sandboxes increase complexity

Risk distribution: Data pipeline > System environment > Agent autonomy.

5

Section 05

Conclusion: Skill Security Urgently Needs Resolution, SkillHarm Provides Research Foundation

SkillHarm is the first benchmark for lifecycle security assessment of skills, revealing severe vulnerabilities in the current agent ecosystem. The high attack success rate indicates that skill security is an urgent issue; as agents are deployed in critical scenarios, skill ecosystem security will become an important topic in AI governance, and SkillHarm provides basic tools for subsequent research.

6

Section 06

Recommendations and Future Directions: Multi-dimensional Improvement of Skill Security

Ecosystem Recommendations

  • Developers: Consider security when designing skills
  • Platforms: Strictly audit (especially SMP stealth attacks)
  • Users: Be vigilant about third-party skill risks
  • Security community: Develop targeted detection and defense technologies

Future Research

  • Dynamic analysis tools: Detect malicious behavior at runtime
  • Formal verification: Security verification of skill code
  • User behavior research: Enhance security awareness
  • Cross-platform expansion: Cover more agent frameworks
  • Defense benchmarks: Evaluate the effectiveness of defense mechanisms