Zing Forum

Reading

Pocket Crew: A Mixture-of-Agents Reasoning System Running on Flagship Smartphones

Pocket Crew is an on-device AI reasoning system designed specifically for flagship smartphones, adopting the Mixture-of-Agents architecture. Multiple small models independently generate answer drafts, which are then evaluated and fused with the best logic by a synthesizer. It effectively controls memory usage through a sequential loading strategy, enabling high-quality local reasoning without an internet connection.

Mixture-of-Agentson-device AImobile LLMAndroidlocal inferenceprivacy-preserving AIsmartphone AIllama.cppedge computingMoA
Published 2026-03-31 20:15Recent activity 2026-03-31 20:19Estimated read 5 min
Pocket Crew: A Mixture-of-Agents Reasoning System Running on Flagship Smartphones
1

Section 01

Pocket Crew: Introduction to the On-Device MoA Reasoning System for Flagship Smartphones

Pocket Crew is an on-device AI reasoning system designed specifically for flagship smartphones, using the Mixture-of-Agents (MoA) architecture. Its core features include: local operation without an internet connection, generating high-quality answers through multi-model collaboration; using a sequential loading strategy to control memory usage and adapt to smartphone resource constraints; all reasoning processes are completed locally to protect user privacy. This article will cover background, architecture, implementation, applications, and future prospects.

2

Section 02

Challenges of On-Device AI and Solutions from the MoA Architecture

With the improvement of smartphone computing power, on-device AI is evolving toward complex reasoning, but it faces the contradiction between model capability and memory/battery life. The Mixture-of-Agents architecture provides an innovative path: multiple small and medium-sized models independently generate answer drafts, and a synthesizer fuses the best logic, which not only improves quality but also uses resources efficiently. Pocket Crew is the practice of this idea on mobile devices, enabling local multi-model collaborative reasoning.

3

Section 03

Core Architecture and Memory Optimization of Pocket Crew

MoA Pipeline: Divided into draft generation (multiple models each generate answers with different focuses) and synthesis (the synthesizer evaluates and fuses the best logic). Memory Optimization: Uses a sequential loading strategy—models are loaded one by one, unloaded after generation, and only the synthesizer is kept in memory, solving the problem of smartphone memory limitations. Privacy Protection: All reasoning is completed locally, data never leaves the device, making it suitable for sensitive scenarios.

4

Section 04

Technical Implementation Details of Pocket Crew

Native Android Development: Based on the llama-android module ported from llama.cpp, using Kotlin language. Components include agents (collaboration logic), core (scheduling and memory management), and feature (UI). Model Configuration: Customize model sets and loading order via model_config.json; future support for BYOK mode (replace with cloud models). ARM Optimization: Integrate KleidiAI and Vulkan SDK for acceleration, using NEON instruction set to optimize computation.

5

Section 05

Application Scenarios and Value of Pocket Crew

  1. Privacy-Sensitive Scenarios: For lawyers, doctors, etc., handling sensitive information, local reasoning eliminates the risk of cloud leakage; 2. Offline Environments: Usable on planes, subways, or remote areas (translation, guide summary, data processing); 3. Daily Assistant: Email writing, schedule planning, study tutoring—multi-model collaboration provides more comprehensive answers.
6

Section 06

Significance and Future Prospects of Pocket Crew

Pocket Crew represents the direction of on-device AI: improving effects through architectural innovation rather than scaling a single model. Its open-source nature supports community innovation (model combination, task optimization, IoT expansion). In the future, it will support BYOK mode and combine on-device and cloud collaboration; as smartphone computing power improves, it is expected to be popularized on more devices, making high-quality AI reasoning available on the go.