Section 01
【Main Floor】FASER: A Fine-Grained Speculative Decoding Optimization System for Dynamic LLM Inference
FASER is a fine-grained speculative decoding system optimized for dynamic LLM inference. It addresses the issues of insufficient GPU utilization at low loads and computational waste at high loads in traditional speculative decoding through fine-grained phase management and space reuse techniques. It achieves up to 53% throughput improvement and 1.92x latency reduction in vLLM, providing an efficient solution for LLM inference services.