Section 01
Clairvoyant: Mitigating Head-of-Line Blocking in Serial LLM Backends via Predictive SJF Scheduling (Introduction)
Clairvoyant: Mitigating Head-of-Line Blocking in Serial LLM Backends via Predictive SJF Scheduling (Introduction)
Clairvoyant is a plug-and-play proxy for serial LLM backends (such as Ollama, llama.cpp). It implements predictive Shortest Job First (SJF) scheduling by predicting response lengths using an XGBoost classifier, solving the head-of-line blocking problem under high load and reducing latency for short requests by 70-76% in high-load scenarios. Original Author/Maintainer: Clairvoyant Research Team Source: arXiv (published on June 5, 2026, link: http://arxiv.org/abs/2606.07248v1)