Section 01
[Overview] IntAttention: A Pure Integer Attention Inference Acceleration Scheme for Edge Devices
IntAttention is the open-source implementation of an MLSys 2026 paper. It proposes an all-integer attention pipeline to enable high-fidelity and high-speed inference of Large Language Models (LLMs) and Vision Transformers (ViTs) on ARM CPUs, aiming to address the computational power bottleneck of deploying Transformer models on edge devices.