# TurnBack: Evaluating Geospatial Cognitive Ability of Large Language Models via Reverse Path Tasks

> TurnBack is an innovative benchmark that evaluates the geospatial reasoning and navigation cognitive abilities of large language models by having them handle reverse path tasks, revealing the strengths and limitations of current models in spatial understanding.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-05T19:11:03.000Z
- 最近活动: 2026-04-05T19:18:41.384Z
- 热度: 150.9
- 关键词: 地理空间认知, 大语言模型, 基准测试, 空间推理, 导航, EMNLP, 路径规划, 具身智能
- 页面链接: https://www.zingnex.cn/en/forum/thread/turnback
- Canonical: https://www.zingnex.cn/forum/thread/turnback
- Markdown 来源: floors_fallback

---

## [Introduction] TurnBack Benchmark: Evaluating Geospatial Cognitive Ability of Large Language Models via Reverse Path Tasks

TurnBack is an innovative benchmark that assesses the geospatial reasoning and navigation cognitive abilities of large language models through reverse path tasks, revealing the strengths and limitations of current models in spatial understanding. This benchmark has been accepted by EMNLP 2025, with its core innovation lying in the adoption of the "reverse path" paradigm, which tests the model's ability to deeply understand spatial relationships. This article will discuss aspects such as background, methodology, experimental findings, error analysis, and future directions.

## Background: Spatial Intelligence and Spatial Cognitive Challenges of Large Language Models

Geospatial cognition is at the core of human intelligence, involving spatial relationship understanding, path planning, and memory, which are crucial for AI to achieve natural human-computer interaction and autonomous decision-making. Large language models have made significant progress in text understanding and generation, but their spatial cognitive ability remains an open question. The TurnBack benchmark is designed to systematically evaluate this ability.

## Methodology: Innovative Design Ideas of the TurnBack Benchmark

The core innovation of TurnBack lies in its "reverse path" testing paradigm: given a path description from point A to point B, the model is required to generate the reverse path from B back to A. This is not just a direction reversal; it requires the model to understand the relative positions of landmarks, identify reversible/irreversible road segments (e.g., one-way streets), and convert turn instructions (e.g., left turn to right turn), effectively distinguishing between models with true spatial understanding and those relying on surface pattern matching.

## Methodology: Dataset Construction and Task Hierarchy Design

The TurnBack dataset follows linguistic principles and geoinformation science standards, collecting real-world navigation scenarios (urban streets, parks, indoor spaces, etc.). Each sample includes the original path description, reverse path description, and structured verification information. Tasks are divided into different difficulty levels (from simple straight paths to complex multi-turn routes, familiar/unfamiliar environments), allowing evaluation of model performance under varying complexities.

## Experimental Findings: Current State of Spatial Cognitive Ability in Large Language Models

TurnBack uses a multi-dimensional evaluation system, including text similarity metrics (BLEU, ROUGE) and spatial task-specific metrics (path accuracy rate, turn accuracy rate, landmark recognition rate). Experimental results show: current mainstream large language models perform far below human levels; model size is positively but non-linearly correlated with spatial reasoning ability; models face obvious difficulties in handling specific spatial relationships such as relative direction and distance estimation.

## Error Analysis: Systematic Limitations of Spatial Cognition in Large Language Models

In-depth error analysis reveals the systematic limitations of models. Common errors include direction confusion (left-right reversal), distance misjudgment, topological errors (incorrect judgment of landmark connectivity), and lack of ability to recognize irreversible road segments. This indicates that models have not established an inherent flexible spatial representation and rely more on text pattern matching rather than spatial reasoning.

## Application Value and Future Research Directions

The TurnBack benchmark has academic and practical value: it provides a unified standard for evaluating model spatial cognition, guiding model optimization in application scenarios such as navigation systems and intelligent assistants; it reveals the potential limitations of large language models in the field of embodied intelligence. The project is fully open-source (dataset, evaluation code, framework). Future directions include expanding the dataset, developing dedicated architectures for spatial reasoning, exploring multimodal fusion, and injecting spatial knowledge into pre-trained models.
