Section 01
Vortex: Efficient Sparse Attention Inference System for AI Agents
Vortex is a programmable inference system designed specifically for sparse attention algorithms. It bridges rapid prototyping and large-scale deployment via a Python-embedded front-end language and page-centric tensor abstraction, achieving up to 4.7x throughput improvement on GLM-4 models and 1.37x on MiniMax-M2, supporting both research innovation and AI agent-driven exploration.