# CARTE Benchmark: Exposing Systemic Blind Spots of Large Models in French Regional Knowledge

> The CARTE benchmark uses 2,431 questions covering 13 regions and 14 thematic areas in France to reveal significant performance gaps and pre-training coverage deficits of large models in regional-level geographic knowledge.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-01T09:50:50.000Z
- 最近活动: 2026-06-02T03:27:51.632Z
- 热度: 142.4
- 关键词: CARTE, 区域知识, 文化理解, 基准测试, 法国, 语言多样性, LLM评估, 地理知识
- 页面链接: https://www.zingnex.cn/en/forum/thread/carte
- Canonical: https://www.zingnex.cn/forum/thread/carte
- Markdown 来源: floors_fallback

---

## CARTE Benchmark: Exposing Systemic Blind Spots of Large Models in French Regional Knowledge

The CARTE benchmark uses 2,431 questions covering 13 regions and 14 thematic areas in France to reveal significant performance differences and pre-training coverage gaps of large models in regional-level geographic knowledge. The study evaluated 27 LLMs and found that models perform well in knowledge of mainstream regions (e.g., Paris), but poorly in remote/culturally unique regions (e.g., Corsica, Brittany) as well as dialects and fine-grained cultural knowledge, reflecting systemic biases in pre-training data.

## Research Background: Key Blind Spots in Large Models' Regional Cultural Understanding

Large language models have made significant progress in cultural understanding at the national level, but lack sufficient understanding of subtle differences at the regional level (e.g., Provence vs. Brittany in France). Existing benchmarks mostly focus on cross-country comparisons, language proficiency, or general knowledge, and generally ignore domestic regional differences, making it impossible to evaluate models' ability to understand internal national diversity.

## CARTE Benchmark: A Fine-Grained Regional Knowledge Evaluation Tool

CARTE (Culturally Anchored Regional-Territorial Evaluation) is specifically designed to assess LLMs' knowledge of French regions. France was chosen due to its long history, linguistic diversity (including Breton, etc.), and clear geographical/administrative divisions. It contains 2,431 multiple-choice questions covering 13 metropolitan regions and 14 themes (culture, language, economy, etc.); the CARTE-LV subset focuses on language variants (dialects, regional terms, language policies).

## Experimental Results: Regional and Thematic Differences in Model Performance

Evaluating 27 LLMs with 1B-12B parameters (in few-shot settings) revealed: 1. The scale effect exists but has diminishing marginal returns, and the largest models have not reached saturation; 2. Significant regional differences: high accuracy in the Paris region, low in remote regions like Corsica; 3. Thematic differences: good performance in general knowledge (geography/history), poor in fine-grained culture (dialects/traditions); 4. CARTE-LV shows models struggle to recognize dialects, regional terms, and language policies.

## In-depth Analysis: Systemic Biases in Pre-training Data

The results point to gaps in pre-training data coverage: 1. Data is biased towards mainstream regions (capital/economic centers), standard languages, and popular topics; 2. Models cannot learn missing knowledge, amplifying biases, and struggle to grasp long-tail knowledge (regional details); 3. Limited robustness to domestic regional variations, easily confusing similar regions.

## Technical Methods: Design and Quality Control of the CARTE Benchmark

CARTE question design principles: geographically anchored, discriminative, multi-grained, and objective. Quality control includes expert validation, multiple rounds of proofreading, and balanced coverage. Evaluation metrics include overall/regional/thematic accuracy and confusion matrices.

## Significance and Implications: Recommendations for Model Development and Evaluation

For developers: Need to improve geographic diversity, regional balance, linguistic inclusivity, and long-tail knowledge coverage in pre-training data; For the evaluation community: Provides a new dimension of regional granularity evaluation, which can be extended to other countries; For society: Lack of regional knowledge may lead to cultural neglect, representational bias, and fairness issues.

## Conclusions and Future Directions: Promoting Balanced Development of LLMs' Regional Cultural Understanding

Core conclusion: Current LLMs have systemic gaps in pre-training coverage, with insufficient knowledge of non-mainstream regions, dialects, and fine-grained cultural knowledge. Limitations: Only covers France and French; Future directions: Expand to other countries/languages, dynamic updates, adversarial testing. CARTE provides an example for regional cultural evaluation, and we look forward to more similar work to promote balanced development.
