Reading

SkillDroid: A Skill Compilation and Reuse Framework for Mobile GUI Agents

SkillDroid compiles successful LLM-guided GUI trajectories into parameterized skill templates. It achieves skill replay with zero LLM calls via a three-level matching routing and failure learning mechanism, boasting an 85.3% success rate (which increases to 91% with usage) while reducing LLM calls by 49%.

移动GUI智能体技能编译轨迹复用失败学习效率优化

Published 2026-04-16 19:02Recent activity 2026-04-17 10:30Estimated read 7 min

SkillDroid: A Skill Compilation and Reuse Framework for Mobile GUI Agents

Section 01

Core Introduction to the SkillDroid Framework

SkillDroid is a skill compilation and reuse framework for mobile GUI agents. Its core innovation lies in compiling successful LLM-guided GUI trajectories into parameterized skill templates, enabling skill replay with zero LLM calls through a three-level matching routing and failure learning mechanism. The framework achieves an 85.3% success rate (rising to 91% with usage) while reducing LLM calls by 49%, effectively addressing the efficiency and reliability issues of current LLM-based GUI agents.

Section 02

The Statefulness Dilemma of Mobile GUI Agents

While LLM-based mobile GUI agents can understand natural language instructions to complete various tasks, they face a fundamental efficiency issue: lack of statefulness. Each task call is treated as an independent reasoning process, requiring full LLM inference for every action step. This leads to wasted repeated computations, accumulated delays, unstable reliability, and high costs. Humans reuse experience for repetitive tasks, but current agents lack this capability—this is the core problem SkillDroid aims to solve.

Section 03

Skill Compilation: From Inference to Replay

SkillDroid's core innovation is skill compilation—converting successful LLM-guided GUI trajectories into reusable parameterized skill templates. A skill template includes three key components:

UI action sequence: Structured concrete operation steps (click, swipe, etc.);
Weighted element locator: Multiple positioning strategies (resource ID, text, visual features) and weight assignment;
Typed parameter slots: Allow injection of variable parameters (e.g., recipient, content) during execution. The compilation process analyzes successful trajectories, identifies parameterizable parts and decision points, and generates a general template—similar to compiling an interpreted script into machine code, enabling one compilation for multiple executions.

Section 04

Three-Layer Architecture: Matching, Execution, and Learning

SkillDroid adopts a three-layer architecture:

Matching cascade: When a new instruction arrives, it quickly finds applicable skill templates through three-level filtering: regular pattern matching → embedding similarity matching → application context filtering;
Skill replay: Zero LLM calls—locally execute template actions, use weighted locators to identify elements and inject parameters. In tests, replay success rate is 100%, and speed is 2.4x that of full LLM execution;
Failure learning: When replay fails, analyze the cause (UI update, process change) and take repair strategies such as updating locator weights, adjusting action sequences, or recompiling to ensure the skill library remains effective long-term.

Section 05

Longitudinal Evaluation Results: Efficiency and Reliability Improvements

SkillDroid performed excellently in 150 rounds of longitudinal evaluation:

Success rate: Reached 85.3% (23 percentage points higher than the stateless LLM baseline), increasing from 87% to 91% with usage; the baseline success rate dropped from 80% to 44%;
Reduced LLM calls: Cut LLM calls by 49%, with nearly half of tasks completed via replay;
Robustness: When UI updates cause element changes, the weighted locator and failure learning mechanism ensure system adaptability, allowing quick function recovery via recompilation.

Section 06

Implications for GUI Agent Design

SkillDroid's research has far-reaching implications for GUI agent design:

Hybrid architecture: Retain LLM's ability to handle novel tasks while efficiently processing common tasks via a skill library, achieving the best practice of general + specialized;
Learning as compilation: Convert one-time successful execution into reusable programs, which can be extended to AI fields like code generation, dialogue, and robotics;
Continuous improvement loop: Achieve online system evolution through lightweight local adaptation (failure learning), without expensive retraining, improving reliability in production deployments.

Section 07

Limitations and Future Directions

SkillDroid has limitations: it mainly targets deterministic tasks, with limited reuse value for creative/context-sensitive tasks; skill library management and deduplication need further research. Future directions include: cross-application skill migration, skill combination and nesting, and expansion to desktop and Web GUI scenarios.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15