章节 01
Agave: A High-Performance LLM Inference Engine Built from Scratch with Zig
Agave is a high-performance LLM inference engine written entirely in Zig, with zero external machine learning dependencies. It implements all kernels, quantization, and model logic from scratch, supporting 7 model architectures, 5 computation backends, over 20 quantization types, and features like layered KV cache, multi-modal vision, HTTP server, and interactive REPL. This post breaks down its design, performance, features, and use cases.