Section 01
GS-QA Benchmark Guide: A New Tool for Large-Scale Geospatial Question Answering Evaluation
GS-QA is a comprehensive evaluation benchmark for geospatial question answering, containing 2800 question-answer pairs, covering 28 question templates, supporting multi-source reasoning and various answer types, and providing a comprehensive framework for evaluating the geospatial reasoning capabilities of large language models (LLMs). This benchmark is built based on OpenStreetMap and Wikipedia data, addressing the limitations of existing benchmarks in terms of scale, diversity, and cross-data source reasoning tests.