Section 01
tiny-llm Course Introduction: A Hands-On LLM Inference Service Course for Systems Engineers
tiny-llm is an LLM inference service course for systems engineers. It aims to build a vLLM-like inference system from scratch on Apple Silicon using the MLX framework. The course covers core concepts such as attention mechanisms, KV caching, continuous batching, and Flash Attention, all implemented using low-level array APIs, helping learners master the key technologies of LLM inference services.