# SAMA Dataset: A VQA Benchmark for Evaluating Spatial Reasoning Capabilities of Vision-Language Models

> The first large-scale VQA dataset released by the University of California, Riverside, specifically designed to evaluate the local spatial reasoning capabilities of vision-language models on non-standard attraction maps, containing 4296 question-answer pairs.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-17T01:16:23.000Z
- 最近活动: 2026-06-17T01:23:09.023Z
- 热度: 159.9
- 关键词: SAMA数据集, VQA, 视觉语言模型, 空间推理, 景点地图, 加州大学河滨分校, 基准测试, 多模态AI
- 页面链接: https://www.zingnex.cn/en/forum/thread/sama-vqa
- Canonical: https://www.zingnex.cn/forum/thread/sama-vqa
- Markdown 来源: floors_fallback

---

## Introduction: SAMA Dataset - A New Benchmark for Evaluating Spatial Reasoning Capabilities of VLMs

The SAMA Dataset is the first large-scale VQA benchmark released by the Al-Shareedah team at the University of California, Riverside, specifically designed to evaluate the local spatial reasoning capabilities of vision-language models on non-standard attraction maps. This dataset contains 4296 manually verified question-answer pairs, open-sourced on GitHub (link: https://github.com/Al-Shareedah/SAMA-Dataset), created on June 9, 2026, and updated on June 17.

## Project Background and Motivation

With the progress of vision-language models (VLMs) in tasks like image understanding, evaluating their spatial reasoning capabilities has become increasingly important. Traditional VQA benchmarks are mostly based on standard maps or natural images, while real navigation scenarios often use non-standard attraction maps (such as theme park or shopping mall maps). These maps are drawn non-proportionally and have no standard coordinates, posing unique challenges to AI. The SAMA Dataset was created to fill this evaluation gap.

## Data Generation Method and License

SAMA uses a human-machine collaborative approach to generate data: initial question-answer pairs are first generated using Gemini 3 Pro/Gemma 3, then 100% manually verified and revised to ensure quality. The dataset is open-sourced under the MIT License, allowing free use, modification, and distribution.

## Dataset Overview (Evidence)

SAMA contains 49 real attraction maps (covering 6 categories such as theme parks and zoos), with a total of 4296 question-answer pairs. Question types include facility search, relative positioning, etc. The question-answer pairs are organized in JSON by map category, with complete metadata—for example, shopping mall category questions involve queries about the number of facilities or relative directions.

## Core Challenges and Features

Non-standard attraction maps have characteristics such as non-proportional drawing, no geographic coordinates, symbolic representation, and diverse perspectives, which render traditional geographic reasoning methods ineffective. SAMA focuses on local spatial reasoning, requiring models to recognize symbols, understand relative directions, perform path planning, etc.

## Research Significance and Application Value (Conclusion)

SAMA provides a standardized platform for evaluating VLM spatial reasoning, helping to identify model bottlenecks and compare the pros and cons of different architectures. Its results can be applied to scenarios such as intelligent tour guides, indoor navigation, assistive technologies, and robot navigation.

## Current Limitations

SAMA has the following limitations: it only supports English in terms of language; although the map types are diverse, they can still be expanded (e.g., hospitals, campuses); the 4296 question-answer pairs are of medium scale, requiring a larger scale to improve generalization ability.

## Suggestions for Future Directions

In the future, we can expand multilingual support, add dynamic maps, introduce multi-turn dialogue VQA tasks, develop dedicated model architectures, etc., to further improve the dataset.
