Section 01
[Introduction] Running a 30-Billion-Parameter Large Model on Raspberry Pi Cluster: A Low-Cost Practice of Distributed Inference
Against the backdrop of rising inference costs for Large Language Models (LLMs), the Hermes Cluster project demonstrates an efficient inference solution using extremely low-cost hardware: through a distributed cluster composed of 4 Raspberry Pi 5s, it successfully runs the Qwen3-30B-A3B MoE model with 30 billion parameters, achieving an inference speed of 13.82 tok/s, providing a feasible reference for edge AI deployment.