Zing Forum

Reading

SkillDroid: A Skill Compilation and Reuse Framework for Mobile GUI Agents

SkillDroid compiles successful LLM-guided GUI trajectories into parameterized skill templates. It achieves skill replay with zero LLM calls via a three-level matching routing and failure learning mechanism, boasting an 85.3% success rate (which increases to 91% with usage) while reducing LLM calls by 49%.

移动GUI智能体技能编译轨迹复用失败学习效率优化
Published 2026-04-16 19:02Recent activity 2026-04-17 10:30Estimated read 7 min
SkillDroid: A Skill Compilation and Reuse Framework for Mobile GUI Agents
1

Section 01

Core Introduction to the SkillDroid Framework

SkillDroid is a skill compilation and reuse framework for mobile GUI agents. Its core innovation lies in compiling successful LLM-guided GUI trajectories into parameterized skill templates, enabling skill replay with zero LLM calls through a three-level matching routing and failure learning mechanism. The framework achieves an 85.3% success rate (rising to 91% with usage) while reducing LLM calls by 49%, effectively addressing the efficiency and reliability issues of current LLM-based GUI agents.

2

Section 02

The Statefulness Dilemma of Mobile GUI Agents

While LLM-based mobile GUI agents can understand natural language instructions to complete various tasks, they face a fundamental efficiency issue: lack of statefulness. Each task call is treated as an independent reasoning process, requiring full LLM inference for every action step. This leads to wasted repeated computations, accumulated delays, unstable reliability, and high costs. Humans reuse experience for repetitive tasks, but current agents lack this capability—this is the core problem SkillDroid aims to solve.

3

Section 03

Skill Compilation: From Inference to Replay

SkillDroid's core innovation is skill compilation—converting successful LLM-guided GUI trajectories into reusable parameterized skill templates. A skill template includes three key components:

  1. UI action sequence: Structured concrete operation steps (click, swipe, etc.);
  2. Weighted element locator: Multiple positioning strategies (resource ID, text, visual features) and weight assignment;
  3. Typed parameter slots: Allow injection of variable parameters (e.g., recipient, content) during execution. The compilation process analyzes successful trajectories, identifies parameterizable parts and decision points, and generates a general template—similar to compiling an interpreted script into machine code, enabling one compilation for multiple executions.
4

Section 04

Three-Layer Architecture: Matching, Execution, and Learning

SkillDroid adopts a three-layer architecture:

  1. Matching cascade: When a new instruction arrives, it quickly finds applicable skill templates through three-level filtering: regular pattern matching → embedding similarity matching → application context filtering;
  2. Skill replay: Zero LLM calls—locally execute template actions, use weighted locators to identify elements and inject parameters. In tests, replay success rate is 100%, and speed is 2.4x that of full LLM execution;
  3. Failure learning: When replay fails, analyze the cause (UI update, process change) and take repair strategies such as updating locator weights, adjusting action sequences, or recompiling to ensure the skill library remains effective long-term.
5

Section 05

Longitudinal Evaluation Results: Efficiency and Reliability Improvements

SkillDroid performed excellently in 150 rounds of longitudinal evaluation:

  • Success rate: Reached 85.3% (23 percentage points higher than the stateless LLM baseline), increasing from 87% to 91% with usage; the baseline success rate dropped from 80% to 44%;
  • Reduced LLM calls: Cut LLM calls by 49%, with nearly half of tasks completed via replay;
  • Robustness: When UI updates cause element changes, the weighted locator and failure learning mechanism ensure system adaptability, allowing quick function recovery via recompilation.
6

Section 06

Implications for GUI Agent Design

SkillDroid's research has far-reaching implications for GUI agent design:

  1. Hybrid architecture: Retain LLM's ability to handle novel tasks while efficiently processing common tasks via a skill library, achieving the best practice of general + specialized;
  2. Learning as compilation: Convert one-time successful execution into reusable programs, which can be extended to AI fields like code generation, dialogue, and robotics;
  3. Continuous improvement loop: Achieve online system evolution through lightweight local adaptation (failure learning), without expensive retraining, improving reliability in production deployments.
7

Section 07

Limitations and Future Directions

SkillDroid has limitations: it mainly targets deterministic tasks, with limited reuse value for creative/context-sensitive tasks; skill library management and deduplication need further research. Future directions include: cross-application skill migration, skill combination and nesting, and expansion to desktop and Web GUI scenarios.