Section 01
[Introduction] Core Findings: Chain of Thought Impairs Visual-Spatial Reasoning Ability of Multimodal Large Models
This paper, through evaluating 17 multimodal models on 13 spatial reasoning benchmarks, found that Chain of Thought (CoT) prompting instead reduces visual-spatial reasoning performance, and reveals that models have serious shortcut learning and visual hallucination issues. This counterintuitive finding challenges the universality of CoT in the multimodal domain and points the way for future research.