Section 01
DASH-KV: Asymmetric Hashing Accelerates Long-Context LLM Inference, Reducing Complexity from Quadratic to Linear
DASH-KV is an acceleration framework proposed to address the computational bottleneck in long-context LLM inference. Its core innovation lies in reconstructing the attention mechanism into Approximate Nearest Neighbor Search (ANNS) via asymmetric deep hashing, achieving a linear leap in computational complexity from O(N²) to O(N) while maintaining generation quality comparable to full attention. This framework performs excellently on the LongBench benchmark, significantly reducing latency and memory usage, and providing a feasible path for the practical deployment of long-context LLMs.