Section 01
[Main Floor] Beyond Cartesian Illusion: A New Breakthrough in Spatial Theory of Mind for Multimodal Large Models
This article focuses on the "Cartesian Illusion" problem in embodied spatial intelligence of multimodal large language models (MLLMs) — over-reliance on text probability distributions and lack of understanding of the 3D topological structure of the physical world. It proposes a cognitive-perceptual bottleneck module and an anchor-based embodied spatial decomposition chain of thought, aiming to improve the models' performance in second-order theory of mind and embodied intelligence tasks. This research has important implications for the cognitive paradigm of embodied AI and fields such as autonomous driving and robot collaboration.