Section 01
Argus Engine: Introduction to the High-Performance Rust LLM Inference Engine for ARM64 Edge Devices
Argus Engine is a Rust-based large language model (LLM) inference engine specifically designed for ARM64 edge devices, aiming to address resource constraints in edge-side LLM inference. Key features include support for Q4_0/Q8_0 quantization, OpenCL/CUDA heterogeneous acceleration, intelligent KV cache eviction, and a zero-copy memory architecture. Leveraging Rust's zero-cost abstractions and memory safety features, it enables efficient operation of large models on consumer-grade ARM64 devices, representing an important exploration in edge AI inference technology.