Section 01
Introduction: Overview of the Multiview Spatial Relation Invariance Evaluation Tool
This article introduces the multiview-invariance project—an evaluation toolset built on ScanNet 3D scenes. It systematically assesses the cross-view spatial reasoning consistency of vision-language models (VLMs) by generating image pairs where spatial relations flip due to perspective changes, providing a rigorous benchmark for the 3D spatial reasoning ability of VLMs.