Section 01
[Introduction] WikiVQABench: A New Knowledge-Driven Visual Question Answering Benchmark to Test Multimodal Models' External Knowledge Reasoning Ability
WikiVQABench is a knowledge-driven Visual Question Answering (VQA) benchmark built on Wikipedia and Wikidata, designed to evaluate the performance of Vision-Language Models (VLMs) in scenarios requiring external knowledge reasoning. This benchmark fills the gap where traditional VQA benchmarks overlook the need for knowledge-intensive reasoning. By integrating images, article titles, and structured knowledge, it provides a more comprehensive perspective for assessing the capabilities of multimodal models.