Section 01
Introduction: Elevation-FS4K — A Diagnostic Benchmark for Multi-View Spatial Reasoning Capabilities of VLMs
Elevation-FS4K is a factorial benchmark designed to systematically diagnose the multi-view spatial reasoning capabilities of vision-language models (VLMs). Through its scalable test design, it precisely reveals the specific weaknesses of models in 3D spatial understanding, providing a detailed "diagnostic map" for model improvement.