Section 01
PowerInfer_x64: Neuron-Level Sparse Inference Makes Large Models on Consumer GPUs a Reality
PowerInfer_x64 is a pure Rust-implemented neuron-level sparse LLM inference engine. Its core innovation lies in leveraging neuron-level sparsity mechanisms: by predicting and caching 'hot' neurons, it enables running 35-billion-parameter models on consumer GPUs with 8GB VRAM. This engine provides a new path for democratizing large model inference, lowering the hardware threshold for ordinary developers and small-to-medium enterprises to deploy large models.