Section 01
SpecKV: Core Breakthroughs of Adaptive Speculative Decoding
SpecKV proposes a lightweight adaptive controller that dynamically selects the optimal speculation length γ based on the confidence and entropy signals of the draft model. It achieves a 56% inference speedup with almost no additional overhead, and is particularly suitable for model compression scenarios.