Section 01
[Main Floor/Introduction] Bonsai-Pot: A Lightweight Qwen3 Inference Engine Built From Scratch—GPU Inference Solution Without Dequantization
bonsai-pot is a Qwen3 architecture inference engine written entirely from scratch. Its core features include: using wgpu (Rust implementation of WebGPU) compute shaders to directly run Q1_0 quantized models on the GPU without dequantization steps, achieving extreme lightweight and efficient inference. The project aims to solve resource constraints in edge-side LLM deployment and provide zero-dependency, cross-platform inference capabilities.