Section 01
Core Introduction to the FRIEDA Benchmark
FRIEDA is a benchmark for evaluating multi-step map reasoning capabilities of vision-language models (VLMs) accepted by ICLR 2026. It focuses on open-ended multi-step map reasoning tasks, covering spatial relationships such as topology (boundary, inclusion, etc.), metrics (distance), and directions (orientation). It requires models to perform cross-map multi-hop reasoning. This benchmark fills the gap in map reasoning capability evaluation for existing VLMs, providing two dataset versions: Direct (pure reasoning) and Contextual (map selection required). It supports the evaluation of various open-source/closed-source models, facilitating the improvement of models' spatial reasoning capabilities and cross-domain research.