Reading

TempoVLA: A Vision-Language-Action Model for Robots to Execute Tasks with Controllable Speed

Researchers propose a speed-controllable VLA model that enables robots to move quickly in low-risk phases and slow down for precise operations in high-risk contact phases.

视觉-语言-动作模型机器人控制速度控制轨迹增强动态执行

Published 2026-06-05 01:59Recent activity 2026-06-05 18:19Estimated read 7 min

TempoVLA: A Vision-Language-Action Model for Robots to Execute Tasks with Controllable Speed

Section 01

TempoVLA: Guide to the Speed-Controllable Vision-Language-Action Model

Key Highlights of TempoVLA The research team proposes the TempoVLA model to address the limitation of fixed speed in existing Vision-Language-Action (VLA) models, enabling robots to move quickly in low-risk phases and slow down for precise operations in high-risk contact phases. Its core insight is that motion amplitude determines execution speed, and flexible speed control is achieved through a dual-component architecture (Variable-Speed Trajectory Augmentation VSTA + Speed Conditioning Mechanism). The effectiveness has been verified in both simulation and real-world tasks, providing a new foundation for robot operating systems.

Original Authors/Source

Author Team: Paper author team
Source: arXiv
Original Title: TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies
Link: http://arxiv.org/abs/2606.06491v1
Publication Date: June 4, 2026

Section 02

Problem Background: Limitations of Fixed-Speed VLA

Problem Background: Limitations of Fixed Speed

Robot operations include low-risk transition phases (e.g., moving to a target) and high-risk contact phases (e.g., grasping and assembly). Humans can dynamically adjust speed, but existing VLA models only inherit the single fixed speed from training demonstrations.

Shortcomings of Existing Solutions

Previous methods to accelerate VLA (model compression, KV cache reuse, reinforcement learning fine-tuning) can only switch between fixed speeds and cannot adjust dynamically. Moreover, the deceleration problem has not been fully explored, making it difficult to perform precise slow execution in high-risk phases.

Section 03

TempoVLA Architecture: Dual Components for Speed Control

TempoVLA Dual-Component Architecture

Core Insight

Motion amplitude (the amount of pose change of joints/end-effectors) determines the robot's movement speed: larger amplitude leads to longer execution time (slower), while smaller amplitude leads to faster speed.

1. Data Side: Variable-Speed Trajectory Augmentation (VSTA)

Acceleration: Merge adjacent actions to increase amplitude and complete movement quickly
Deceleration: Split actions to reduce amplitude and execute slowly
Effect: Preserves motion semantics, accurately reaches target speed, and improves default performance at 1x speed

2. Model Side: Speed Conditioning Mechanism

Feed the target speed as an explicit input to the policy network to generate actions with corresponding amplitudes, enabling flexible speed control.

Section 04

Experimental Validation: Results from Simulation to Real World

Experimental Validation Results

Bidirectional Speed Control

Low-risk transition phase: Fast movement saves time
High-risk contact phase: Slow execution improves success rate

Dynamic Speed Adjustment

Cooperation with Large Multimodal Models (LMM):

LMM analyzes the scene to determine risk level and sends speed commands (e.g., slow down when approaching the target, speed up when moving away from obstacles)
The hierarchical architecture combines high-level scene understanding and low-level motion control, showing the direction of end-to-end systems.

Section 05

Technical Contributions and Engineering Significance

Theoretical Aspect

Reveals the essential relationship between motion amplitude and execution speed
Proposes a new paradigm for variable-speed learning (data augmentation instead of modifying model structure)

Engineering Aspect

A single model supports multiple speeds without training multiple models
Speed conditioning is plug-and-play, easy to integrate into existing VLA architectures
VSTA improves data utilization and enhances basic performance

Application Scenarios

Industrial assembly: Fast approach + slow assembly
Service robots: Dynamically adjust speed based on environmental complexity
Medical robots: Extremely slow execution for high-risk operations, fast movement in transition phases

Section 06

Limitations and Future Research Directions

Limitations and Future Directions

Current Limitations

Speed range is limited by the coverage of training data
Poor generalization for extreme speeds (beyond training distribution)
Dynamic control relies on LMM scene analysis, which may increase inference latency

Future Research

Combine reinforcement learning to optimize speed strategies
Explore self-supervised variable-speed learning without speed labels
Extend to complex robot forms such as humanoid and soft robots

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49