Section 01
AVA-VLA: A New Paradigm for Visual-Language-Action Models Enabling Robots to Think Less and Act Faster
The AVA-VLA project, accepted by ICML 2026, proposes an innovative visual-language-action (VLA) model architecture. Addressing the dilemma of traditional VLA models—"the more you think, the slower you act; the less you think, the more errors you make"—it achieves a dynamic balance between efficiency and accuracy through three key mechanisms: latent reasoning, reinforcement learning denoising, and adaptive early exit. This allows robots to significantly reduce inference steps while ensuring control accuracy, which is of great significance for real-time robot control scenarios.