Section 01
TempoVLA: Guide to the Speed-Controllable Vision-Language-Action Model
Key Highlights of TempoVLA The research team proposes the TempoVLA model to address the limitation of fixed speed in existing Vision-Language-Action (VLA) models, enabling robots to move quickly in low-risk phases and slow down for precise operations in high-risk contact phases. Its core insight is that motion amplitude determines execution speed, and flexible speed control is achieved through a dual-component architecture (Variable-Speed Trajectory Augmentation VSTA + Speed Conditioning Mechanism). The effectiveness has been verified in both simulation and real-world tasks, providing a new foundation for robot operating systems.
Original Authors/Source
- Author Team: Paper author team
- Source: arXiv
- Original Title: TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies
- Link: http://arxiv.org/abs/2606.06491v1
- Publication Date: June 4, 2026