Section 01
SlidingServe: Guide to SLO-Aware LLM Inference Scheduling System
Title: SlidingServe: SLO-Aware Sliding Window Scheduling System for LLM Online Inference
Original Author/Team: Paper Author Team (arXiv submission) Source Platform: arXiv Original Title: Beyond Greedy Chunking: SLO-Aware Sliding-Window Scheduling for LLM Inference Original Link: http://arxiv.org/abs/2606.05933v1 Release Time: June 4, 2026
Core Insight: SlidingServe uses a lightweight batch latency predictor, dynamic chunking, and multi-level priority sorting to increase LLM inference throughput by up to 30% while ensuring service quality, and reduce SLO violation rates by 16%-53% under high load.