Section 01
CARTE Benchmark: Exposing Systemic Blind Spots of Large Models in French Regional Knowledge
The CARTE benchmark uses 2,431 questions covering 13 regions and 14 thematic areas in France to reveal significant performance differences and pre-training coverage gaps of large models in regional-level geographic knowledge. The study evaluated 27 LLMs and found that models perform well in knowledge of mainstream regions (e.g., Paris), but poorly in remote/culturally unique regions (e.g., Corsica, Brittany) as well as dialects and fine-grained cultural knowledge, reflecting systemic biases in pre-training data.