Section 01
Zero-TVM Project Overview: Handwritten WGSL Shaders for Browser LLM Inference
Zero-TVM is a browser-side LLM inference project that replaces the complex Apache-TVM compiler stack with handwritten WGSL shaders. It uses only 10 kernel roles (27 WGSL files, ~3k lines of code) and ~2k lines of TypeScript to run Phi-3-mini-4k-instruct in browsers. On M2 Pro, it achieves ~40 tok/s—only 22% slower than WebLLM's auto-tuned TVM version—while providing a fully readable, auditable GPU compute stack.