Zing Forum

Reading

Running Large Language Models Locally on Android Phones: Analysis of the Local LLM/AI Project

Explore how to run lightweight large language models like Qwen, DeepSeek, and Gemma completely offline on mobile devices to achieve a privacy-protected local AI conversation experience.

Android本地大语言模型端侧AIMediaPipe隐私保护移动开发Jetpack Compose离线AI
Published 2026-05-31 14:23Recent activity 2026-05-31 14:51Estimated read 7 min
Running Large Language Models Locally on Android Phones: Analysis of the Local LLM/AI Project
1

Section 01

Introduction / Main Floor: Running Large Language Models Locally on Android Phones: Analysis of the Local LLM/AI Project

Explore how to run lightweight large language models like Qwen, DeepSeek, and Gemma completely offline on mobile devices to achieve a privacy-protected local AI conversation experience.

2

Section 02

Original Author and Source

3

Section 03

Introduction: The AI Privacy Revolution on Mobile Devices

With the rapid development of Large Language Model (LLM) technology, users' reliance on AI assistants has deepened. However, most AI applications need to send data to cloud servers for processing, which raises serious privacy concerns. The Local LLM/AI project emerged as a solution—it is a high-performance offline app designed specifically for Android devices, allowing users to run complete AI models locally and conduct intelligent conversations without an internet connection, truly keeping data on the device.

4

Section 04

Project Overview and Core Technologies

Local LLM/AI is built on Google's MediaPipe Tasks GenAI engine, using modern Jetpack Compose and Material 3 design standards to provide a smooth user experience. The project's core innovation is optimizing large language models that originally required cloud computing resources to run efficiently on mobile hardware.

The app uses a dual-version build strategy to adapt to different hardware configurations:

  • Normal Version: Optimized for mobile GPUs supporting Vulkan, providing responsive streaming generation capabilities, and gracefully falling back to CPU inference when the GPU is unavailable
  • NPU Version: Designed specifically for modern phones equipped with a Neural Processing Unit (NPU), directly calling the device's AI chip via NNAPI to achieve more energy-efficient inference performance
5

Section 05

Supported Models and Hardware Requirements

The project has built-in support for multiple lightweight yet powerful open-source models, which are specially optimized to adapt to the computing limitations of mobile devices:

Model Name Developer Parameter Count Model Size Minimum Memory Requirement
Qwen 2.5 1.5B Instruct Alibaba 1.5B ~1.6 GB 6 GB+
DeepSeek-R1 Distill Qwen 1.5B DeepSeek 1.5B ~1.6 GB 6 GB+
Gemma 1.1 2B IT Google 2B ~1.4 GB 8 GB+
Phi-2 2.7B Microsoft 2.7B ~1.6 GB 8 GB+

Notably, due to the large size of model files (over 1.5GB), the developer chose not to package them in the APK; instead, users need to download or transfer them manually. This design not only controls the app size but also gives users the flexibility to choose models.

6

Section 06

Fully Offline Multimodal Capabilities

Local LLM/AI is not just a text conversation tool; it also integrates powerful multimodal processing capabilities. By incorporating Google ML Kit's text recognition feature, the app can perform OCR text extraction on images completely offline. Users can take photos of documents or import PDF files, and the app will automatically recognize the text content and include it in the conversation context.

Additionally, the app supports attaching videos, images, and various documents (PDFs, code files, text files) and can preview these contents in the conversation. Video files can be played via the native player, and documents are opened via system Intent using the appropriate app.

7

Section 07

Privacy-First Design Philosophy

Privacy protection is the core design philosophy of this project. All computations are done locally on the device, no internet connection is needed (after the initial model download), and conversation data never leaves the user's device. The app does not collect any logs or perform any form of tracking. This design is particularly important for users handling sensitive information, such as medical consultations, legal issues, or business confidential discussions.

8

Section 08

Exquisite User Interface and Interaction

The app uses Material 3 dynamic theme design, supports dark mode, and interface elements have smooth animation transition effects. The conversation interface uses streaming text display to simulate real typing effects. The collapsible OCR log card design allows users to easily view detailed results of image recognition. Sidebar navigation and multi-column responsive layout ensure a good user experience on devices of different screen sizes.