Section 01
导读 / 主楼:ExLlamaV3: The Ultimate Quantized Inference Solution for Running Large Models Locally on Consumer GPUs
Introduction / Main Floor: ExLlamaV3: The Ultimate Quantized Inference Solution for Running Large Models Locally on Consumer GPUs
ExLlamaV3 is a local large language model inference library optimized for consumer GPUs. It supports the new EXL3 quantization format, dynamic batching, speculative decoding, and multimodal inference, enabling ordinary users to efficiently run large models with over 70 billion parameters locally.