Section 01
SpatialWorld Benchmark: Core Challenges in Spatial Reasoning for Multimodal Agents
SpatialWorld is a new interactive spatial reasoning benchmark for multimodal agents, integrating 8 heterogeneous simulation backends (covering home environments, travel scenarios, social collaboration, etc.) and containing 760 manually annotated tasks. Evaluation results show that even the current strongest closed-source model GPT-5 has an average task success rate of only 17.4%, revealing significant bottlenecks in the active exploration and long-term planning capabilities of multimodal agents. This benchmark comes from a paper published on arXiv on June 8, 2026 (link: http://arxiv.org/abs/2606.09669v1).