Zing Forum

Reading

PlanBench-V: The First Visual-Language Model Evaluation Benchmark for Spatial Planning Diagrams

PlanBench-V is the first comprehensive benchmark specifically designed to evaluate the ability of visual-language models (VLMs) to interpret spatial planning diagrams. By constructing an expert-annotated dataset containing 223 planning diagrams and 1629 question-answer pairs, it reveals the capability boundaries of current VLMs across four dimensions: perception, reasoning, association, and implementation.

Vision-Language Models空间规划城市规划多模态评测基准测试地理信息系统空间推理领域适应性AI
Published 2026-06-04 14:17Recent activity 2026-06-05 16:49Estimated read 6 min
PlanBench-V: The First Visual-Language Model Evaluation Benchmark for Spatial Planning Diagrams
1

Section 01

[Introduction] PlanBench-V: The First VLM Evaluation Benchmark for Spatial Planning Diagrams Released

PlanBench-V is the first comprehensive evaluation benchmark specifically for assessing the ability of visual-language models (VLMs) to interpret spatial planning diagrams. Released by the arXiv author team on June 4, 2026 (link: http://arxiv.org/abs/2606.05744v1), this benchmark constructs a dataset with 223 planning diagrams and 1629 expert-annotated question-answer pairs. It reveals the capability boundaries of current VLMs through a four-dimensional framework (perception, reasoning, association, implementation) and has open-sourced its code and dataset (https://plangpt.github.io).

2

Section 02

Background & Problem: Challenges in Interpreting Spatial Planning Diagrams and Limitations of Existing Benchmarks

Spatial planning diagrams are core tools for land governance, requiring fine-grained visual perception, spatial reasoning, and professional policy judgment—posing challenges to both humans and AI. Existing multimodal benchmarks focus on general visual tasks, ignoring the unique cognitive processes in planning practice (e.g., policy implications, regulatory constraints, and other professional knowledge needs), and lack specialized evaluation benchmarks for spatial planning diagrams.

3

Section 03

Core Methods: SPMD Dataset and Four-Dimensional Evaluation Framework

1. Spatial Planning Map Database (SPMD)

Contains 223 real planning diagrams covering different regions and styles, plus 1629 multi-level question-answer pairs designed by domain experts, ensuring that the questions reflect cognitive challenges in planning practice.

2. Four-Dimensional Evaluation Framework

  • Perception: Recognize basic visual elements such as plot boundaries and land use types;
  • Reasoning: Calculate distances, analyze connectivity, and other spatial logical relationships;
  • Association: Link visual information with policy implications (regulatory constraints, development intensity, etc.);
  • Implementation: Perform evaluation judgments and policy-sensitive decision-making tasks (the highest level).
4

Section 04

Experimental Findings: Generational Progress of VLMs and Bottlenecks in Implementation Tasks

  1. Significant generational progress: The 2026 best model Qwen3.6-Plus achieved a 27% overall performance improvement compared to 2025's GPT-4o;
  2. Bottleneck in implementation tasks: All models performed poorly in implementation tasks (evaluation judgment, policy sensitivity, constraint-based decision-making), reflecting fundamental limitations in professional planning contexts;
  3. Need for domain-adaptive frameworks: General VLMs require optimization with domain knowledge to handle professional tasks.
5

Section 05

Technical Implementation and Open Resources

The research team has open-sourced the code and dataset, accessible at: https://plangpt.github.io. The open resources support experiment reproduction, new model development, dataset expansion, and establishment of fine-grained evaluation metrics.

6

Section 06

Industry Implications: Planning Practice, Model Development, and Policy Making

  • Urban planning practice: Provides an evaluation basis for the reliability of AI-assisted planning tools;
  • Model development: Guides VLMs to improve deep understanding of professional domains;
  • Policy making: Provides a risk assessment framework for AI deployment in applications like smart cities.
7

Section 07

Future Outlook: Development Directions for Intelligent Planning Assistants

Key directions for future efforts:

  1. Multimodal fusion (integrating remote sensing, 3D models, and real-time data);
  2. Interactive reasoning (collaborative analysis between planners and AI);
  3. Interpretability (transparently presenting the reasoning process);
  4. Continuous learning (improving the system based on practical feedback). PlanBench-V serves as a bridge between AI research and planning practice, providing a technical roadmap for the future of smart cities.