Section 01
DASH-KV: Asymmetric Hashing Enables Linear-Complexity Inference for Long-Context LLMs (Introduction)
DASH-KV reframes the attention mechanism as an approximate nearest neighbor search using asymmetric deep hashing technology, successfully reducing the complexity of long-context LLM inference from O(N²) to linear O(N) while maintaining performance comparable to full-precision attention, thus solving the bottleneck problem of traditional attention mechanisms in long-sequence processing.