Reading

SkillHarm: Lifecycle Security Assessment and Automated Attack Construction for Agent Skills

This paper proposes the SkillHarm benchmark to systematically evaluate the security risks of agent skills throughout their full lifecycle. Through two attack scenarios—Fixed Payload Poisoning (FPP) and Self-Mutating Poisoning (SMP)—the study identifies 12 risk types, with current agents having an attack success rate as high as 86.3%, revealing severe security vulnerabilities in the skill ecosystem.

智能体安全技能投毒AI安全攻击基准生命周期安全LLM智能体

Published 2026-06-02 01:45Recent activity 2026-06-02 12:55Estimated read 6 min

SkillHarm: Lifecycle Security Assessment and Automated Attack Construction for Agent Skills

Section 01

Introduction: SkillHarm Reveals Severe Security Vulnerabilities in the Agent Skill Ecosystem

This paper proposes the SkillHarm benchmark, the first systematic evaluation of the security risks of agent skills throughout their full lifecycle. Through two attack scenarios—Fixed Payload Poisoning (FPP) and Self-Mutating Poisoning (SMP)—12 risk types are identified, with current agents having an FPP attack success rate as high as 86.3%, revealing severe security vulnerabilities in the skill ecosystem.

Section 02

Background: Skills as Privileged Attack Surfaces for Agents, Limitations in Existing Research

Privileged Characteristics of Skills

Implicit trust: Agents automatically discover and execute skills without explicit authorization
Persistent state: Saves data across sessions, affecting subsequent interactions
System-level access: Requires permissions for sensitive resources (files/databases/APIs)
Third-party ecosystem: Open contributions drive innovation but also increase risks

Limitations of Existing Research

Single-point evaluation: Ignores cumulative effects of repeated use and cross-session impacts
Ad-hoc risk enumeration: Lacks systematic classification, making comparison and integration difficult

Skill Lifecycle

It includes six stages: installation, discovery, initialization, execution, cleanup, and reuse. Understanding the full lifecycle is key to offense and defense.

Section 03

Methodology: Two Attack Scenarios + 12 Risk Categories + Automated Construction Tool

Attack Scenarios

Fixed Payload Poisoning (FPP)：Malicious payload is fixed and triggered on first invocation, e.g., data theft/system destruction
Self-Mutating Poisoning (SMP)：Initially benign; first execution modifies persistent state, and delayed attacks are triggered in subsequent sessions (highly stealthy)

Risk Classification

Data pipeline (4 types): Theft/contamination/injection/leakage
System environment (4 types): File/network/process abuse, resource exhaustion
Agent autonomy (4 types): Behavior manipulation/tool abuse/session hijacking/target tampering

AutoSkillHarm Tool

Through natural language description → code generation → verification → integration, it constructs 879 attack samples covering 71 skill scenarios.

Section 04

Experimental Results: Significant Agent Vulnerability, Insufficient Existing Defenses

Attack Success Rate

FPP: 86.3% (most fixed attacks succeed)
SMP: 69.3% (stealthy delayed attacks still have high success rates)

Hidden Risks

Most seemingly failed attacks are due to agents not invoking skills correctly; the actual defense rate is lower

Limitations of Existing Defenses

Static analysis struggles to detect SMP (initial code is benign)
Principle of least privilege is hard to practice (legitimate skills require broad permissions)
Behavior monitoring has high false positives; sandboxes increase complexity

Risk distribution: Data pipeline > System environment > Agent autonomy.

Section 05

Conclusion: Skill Security Urgently Needs Resolution, SkillHarm Provides Research Foundation

SkillHarm is the first benchmark for lifecycle security assessment of skills, revealing severe vulnerabilities in the current agent ecosystem. The high attack success rate indicates that skill security is an urgent issue; as agents are deployed in critical scenarios, skill ecosystem security will become an important topic in AI governance, and SkillHarm provides basic tools for subsequent research.

Section 06

Recommendations and Future Directions: Multi-dimensional Improvement of Skill Security

Ecosystem Recommendations

Developers: Consider security when designing skills
Platforms: Strictly audit (especially SMP stealth attacks)
Users: Be vigilant about third-party skill risks
Security community: Develop targeted detection and defense technologies

Future Research

Dynamic analysis tools: Detect malicious behavior at runtime
Formal verification: Security verification of skill code
User behavior research: Enhance security awareness
Cross-platform expansion: Cover more agent frameworks
Defense benchmarks: Evaluate the effectiveness of defense mechanisms

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15