Section 01
SOLE-R1: Using Video Language Reasoning as the Sole Reward Signal for Robot RL (Introduction)
This post introduces SOLE-R1, a video language reasoning model designed specifically for robot reinforcement learning (RL). It generates dense task progress estimates via spatiotemporal chain-of-thought reasoning to serve as the sole reward signal. Notably, SOLE-R1 enables robots to learn 24 unseen manipulation tasks from scratch without real rewards, demonstrations, or task-specific tuning, addressing the reward hacking problem common with general visual language models (VLMs) in RL applications.