Section 01
[Introduction] MiniMax Sparse Attention: A New Efficient Attention Paradigm for Million-Scale Long Context
Key Information
- Mechanism: MiniMax proposes the MSA sparse attention mechanism, which dynamically selects key KV blocks via a lightweight indexing branch
- Effect: On a 109B parameter model, it achieves a 28.4x reduction in computational load, with performance comparable to GQA
- Source: By Xunhao Lai et al. (MiniMax team) published on arXiv on June 11, 2026. Open-source code and models can be found at https://github.com/MiniMax-AI/MSA and https://huggingface.co/MiniMaxAI/MiniMax-M3
- Keywords: Sparse attention, long context, large language model, MiniMax, GQA, inference acceleration, GPU optimization
This article will analyze from aspects such as background, architecture, optimization, and experiments