Section 01
DashAttention: A Differentiable and Adaptive Sparse Hierarchical Attention Mechanism
DashAttention is an innovative sparse hierarchical attention mechanism proposed in May 2026, designed to address the bottleneck of quadratic computation and memory overhead of full attention in long-context modeling for large language models (LLMs). Its core advantage lies in using the α-entmax transformation to achieve adaptive sparse block selection, maintaining accuracy comparable to full attention while reaching 75% sparsity, and its inference speed surpasses FlashAttention-3.