Section 01
[Introduction] Kairu: Core Introduction to the High-Performance Speculative Decoding Engine for HuggingFace Models
Kairu is an open-source speculative decoding engine designed specifically for HuggingFace models. It offers features such as EAGLE-style draft generation, dynamic early exit, and token budget control. It significantly improves the inference speed of large language models (LLMs) without sacrificing output quality, is compatible with the existing HuggingFace ecosystem, and supports real-time performance monitoring and cost control.