Section 01
[Introduction] Research on Urban Airspace Embodied Navigation Benchmark: Current Status and Challenges of Large Models' Spatial Action Capabilities
This paper aims to evaluate the spatial action capabilities of Large Multimodal Models (LMMs). By constructing a city airspace goal-oriented navigation dataset with 5037 samples, it systematically assesses 17 representative models, reveals the rapid deviation of current models after critical decision fork points, and explores four improvement directions. The study shows that although LMMs have preliminary spatial action capabilities, there is still a significant gap from human-level performance.