Section 01
Introduction: EVOKE — An Intelligent KV Cache Optimization Scheme for Long-Context LLM Inference
EVOKE is a KV cache optimization technique for long-context large language model (LLM) inference. It solves the cache overflow problem in long conversational sessions through selective cache eviction and recalculation-free block recovery mechanisms, reducing memory usage while maintaining inference efficiency. This scheme was released by Anyesh on GitHub with the original title 'EVOKE: EVict and recOver KV cache Entries'.