Section 01
Feather: Optimizing Prefix Homogeneity via Reinforcement Learning to Achieve 2-10x LLM Inference Throughput Improvement (Introduction)
Feather is a prefix-aware scheduler whose core uses reinforcement learning to find the optimal trade-off between batch size and prefix homogeneity, and introduces a Chunked Hash Tree (CHT) for fast prefix detection. In integration tests with vLLM and SGLang, Feather achieves a 2-10x throughput improvement, and its performance is not inferior to existing solutions in scenarios without prefix sharing.