Section 01
TierKV: A Cross-Node Distributed KV Cache System That Speeds Up LLM Long-Context Inference by 7x [Introduction]
TierKV is a cross-node distributed KV cache system. Targeting the cold start problem caused by KV cache eviction in LLM long-context inference, it retains evicted KV caches across networks through a three-tier architecture (GPU Hot Tier, LAN Cold KV Tier, WiFi Cold SSM Tier). This reduces the TTFT of long-context inference from 30 seconds to 4 seconds (a 7x speedup) and provides a new idea for cost-effective expansion of LLM inference context length.