Section 01
[Introduction] SpatialLadder: A Spatial Reasoning Training Framework for Small Models to Outperform Large Models
The REAL Lab at Zhejiang University proposes the SpatialLadder three-stage progressive training framework. Using a hierarchical training strategy of perception → understanding → reasoning, this framework enables a 3B-parameter vision-language model (VLM) to outperform GPT-4o and Gemini-2.0-Flash on spatial reasoning tasks. The related paper has been accepted by ICLR 2026. The project has open-sourced the code, paper, pre-trained model, dedicated dataset SpatialLadder-26k, and benchmark test SPBench.