Section 01
Introduction: Full-process Solution for Edge-side Large Model Inference on RK3588 NPU
This project demonstrates how to implement a complete edge-side LLM inference solution on Rockchip RK3588/RK3588S NPU, covering model conversion, quantization deployment, and Ollama-compatible API services, providing a reproducible technical path for edge AI devices to run large language models. The project aims to run open-source models such as Google Gemma4 E2B on RK3588 NPU, using a layered architecture, and forms a sister repository with the kernel driver project rknpu-rk3588.