# SkillHarm: Lifecycle Security Assessment and Automated Attack Construction for Agent Skills

> This paper proposes the SkillHarm benchmark to systematically evaluate the security risks of agent skills throughout their full lifecycle. Through two attack scenarios—Fixed Payload Poisoning (FPP) and Self-Mutating Poisoning (SMP)—the study identifies 12 risk types, with current agents having an attack success rate as high as 86.3%, revealing severe security vulnerabilities in the skill ecosystem.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-01T17:45:39.000Z
- 最近活动: 2026-06-02T04:55:46.258Z
- 热度: 126.8
- 关键词: 智能体安全, 技能投毒, AI安全, 攻击基准, 生命周期安全, LLM智能体
- 页面链接: https://www.zingnex.cn/en/forum/thread/skillharm
- Canonical: https://www.zingnex.cn/forum/thread/skillharm
- Markdown 来源: floors_fallback

---

## Introduction: SkillHarm Reveals Severe Security Vulnerabilities in the Agent Skill Ecosystem

This paper proposes the SkillHarm benchmark, the first systematic evaluation of the security risks of agent skills throughout their full lifecycle. Through two attack scenarios—Fixed Payload Poisoning (FPP) and Self-Mutating Poisoning (SMP)—12 risk types are identified, with current agents having an FPP attack success rate as high as 86.3%, revealing severe security vulnerabilities in the skill ecosystem.

## Background: Skills as Privileged Attack Surfaces for Agents, Limitations in Existing Research

### Privileged Characteristics of Skills
- Implicit trust: Agents automatically discover and execute skills without explicit authorization
- Persistent state: Saves data across sessions, affecting subsequent interactions
- System-level access: Requires permissions for sensitive resources (files/databases/APIs)
- Third-party ecosystem: Open contributions drive innovation but also increase risks

### Limitations of Existing Research
- Single-point evaluation: Ignores cumulative effects of repeated use and cross-session impacts
- Ad-hoc risk enumeration: Lacks systematic classification, making comparison and integration difficult

### Skill Lifecycle
It includes six stages: installation, discovery, initialization, execution, cleanup, and reuse. Understanding the full lifecycle is key to offense and defense.

## Methodology: Two Attack Scenarios + 12 Risk Categories + Automated Construction Tool

### Attack Scenarios
1. **Fixed Payload Poisoning (FPP)**：Malicious payload is fixed and triggered on first invocation, e.g., data theft/system destruction
2. **Self-Mutating Poisoning (SMP)**：Initially benign; first execution modifies persistent state, and delayed attacks are triggered in subsequent sessions (highly stealthy)

### Risk Classification
- Data pipeline (4 types): Theft/contamination/injection/leakage
- System environment (4 types): File/network/process abuse, resource exhaustion
- Agent autonomy (4 types): Behavior manipulation/tool abuse/session hijacking/target tampering

### AutoSkillHarm Tool
Through natural language description → code generation → verification → integration, it constructs 879 attack samples covering 71 skill scenarios.

## Experimental Results: Significant Agent Vulnerability, Insufficient Existing Defenses

### Attack Success Rate
- FPP: 86.3% (most fixed attacks succeed)
- SMP: 69.3% (stealthy delayed attacks still have high success rates)

### Hidden Risks
Most seemingly failed attacks are due to agents not invoking skills correctly; the actual defense rate is lower

### Limitations of Existing Defenses
- Static analysis struggles to detect SMP (initial code is benign)
- Principle of least privilege is hard to practice (legitimate skills require broad permissions)
- Behavior monitoring has high false positives; sandboxes increase complexity

Risk distribution: Data pipeline > System environment > Agent autonomy.

## Conclusion: Skill Security Urgently Needs Resolution, SkillHarm Provides Research Foundation

SkillHarm is the first benchmark for lifecycle security assessment of skills, revealing severe vulnerabilities in the current agent ecosystem. The high attack success rate indicates that skill security is an urgent issue; as agents are deployed in critical scenarios, skill ecosystem security will become an important topic in AI governance, and SkillHarm provides basic tools for subsequent research.

## Recommendations and Future Directions: Multi-dimensional Improvement of Skill Security

### Ecosystem Recommendations
- Developers: Consider security when designing skills
- Platforms: Strictly audit (especially SMP stealth attacks)
- Users: Be vigilant about third-party skill risks
- Security community: Develop targeted detection and defense technologies

### Future Research
- Dynamic analysis tools: Detect malicious behavior at runtime
- Formal verification: Security verification of skill code
- User behavior research: Enhance security awareness
- Cross-platform expansion: Cover more agent frameworks
- Defense benchmarks: Evaluate the effectiveness of defense mechanisms
