Zing Forum

Reading

VLA-HRM: Innovative Application of Recursive Reasoning Models in Robotic Control

This project applies TRM (Tiny Recursive Model) and HRM (Hierarchical Reasoning Model) to robotic manipulation tasks. Through recursive weight-sharing computation and continuous observation encoding, it outperforms the diffusion policy baseline in the PushT task.

机器人学习递归模型模仿学习强化学习扩散策略机器人控制开源项目
Published 2026-03-30 19:03Recent activity 2026-03-30 19:23Estimated read 5 min
VLA-HRM: Innovative Application of Recursive Reasoning Models in Robotic Control
1

Section 01

VLA-HRM Project Introduction: Innovative Application of Recursive Reasoning Models in Robotic Control

The VLA-HRM project adapts TRM (Tiny Recursive Model) and HRM (Hierarchical Reasoning Model), originally used for discrete reasoning tasks, to continuous robotic control scenarios. For the PushT task (pushing a T-shaped block to a target position), through designs like continuous observation encoding and recursive weight sharing, it outperforms the diffusion policy baseline in performance while being more parameter-efficient.

2

Section 02

Background: Challenges from Discrete Reasoning to Continuous Robotic Control

Recursive reasoning models (such as TRM/HRM) were initially used for discrete tasks (sudoku, mazes, etc.). However, robotic control (like the PushT task) has a continuous observation space (5-dimensional state: agent position, block position, angle) and a continuous action space (2-dimensional target position), requiring long-term planning and dealing with complex contact dynamics. Adapting discrete reasoning models to continuous control scenarios is the core challenge of the VLA-HRM project.

3

Section 03

Evolution of Technical Solutions and Core Model Architecture

The project went through three iterations: V1 (discrete observation/action, failed) → V2 (continuous observation + discrete action, partially successful) → V3 (fully continuous, breakthrough). Core architecture: TRM uses a recursive design with weight sharing (a single module handles both high and low levels, memory-efficient); HRM introduces explicit hierarchy (high-level planning, low-level control); innovations include action query tokens supporting parallel action decoding.

4

Section 04

Training Strategies and Key Optimization Techniques

The project uses various training techniques to improve performance: observation noise augmentation (Gaussian noise to prevent overfitting), geometric feature engineering (21 hand-designed geometric features to inject domain knowledge), data augmentation (mirror symmetry to expand data by 4x), iterative refinement (multi-step improvement of action sequences, reaching a single-score of 0.942 at K=8 steps).

5

Section 05

Experimental Result Analysis and Comparison

Results show: HRM V8 (h=384) achieves an average score of 0.558, outperforming the diffusion policy (0.507) while having only 1/8 the number of parameters; continuous regression action representation is better than discrete quantization; TRM slightly outperforms HRM under the same configuration (speculated that the PushT task has no obvious hierarchy).

6

Section 06

Key Insights and Future Directions

Key insights: Continuous representation is crucial for robotic control; recursive architecture is suitable for sequential decision-making; observation augmentation effectively prevents overfitting; geometric priors accelerate learning. Limitations: Only supports state input, single-task specialization, simulation environment. Future directions: VLA expansion (Vision-Language-Action), multi-task learning, real robot validation, fusion with diffusion models.