Section 01
[Introduction] Stream LLM: Browser-side Streaming LLM Inference via WebGPU and Model Sharding
stream-llm is an innovative open-source project that enables client-side LLM inference without server-side GPUs by splitting GGUF models into hierarchical shards and leveraging browser WebGPU, providing new ideas for edge computing and privacy-preserving inference.