Section 01
[Introduction] How to Teach AI to Visual Think? New Breakthrough in Cross-View Spatial Reasoning
The research team proposed the View Drop (VDrop) training method and panoramic visual thinking strategy, solving the key problem where vision-language models (VLMs) rely on language and lose fine-grained geometric information in cross-view spatial reasoning, and achieving the best out-of-domain generalization performance.
Source: Paper published on arXiv on May 26, 2026, titled "How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning" (link: http://arxiv.org/abs/2605.27310v1)