Section 01
Introduction: SinkRouter—A New Framework for Long-Context Decoding Acceleration
SinkRouter is a training-agnostic selective routing framework. By deeply understanding the essence of the Attention Sink phenomenon (stable, reachable, and error-controllable fixed points), it detects sink signals and skips computations that produce near-zero outputs. Combined with hardware-aware Triton kernels, it achieves a 2.03x speedup at 512K context length while maintaining competitive accuracy, providing an efficient solution for the deployment of long-context large models.