Section 01
ReactBench: A Guide to the Causality-Driven Multimodal Hallucination Evaluation Benchmark
ReactBench is a groundbreaking multimodal hallucination evaluation benchmark that, for the first time, assesses the hallucination issues of multimodal large language models (MLLMs) from a causality-driven perspective rather than a simple result-detection approach. It addresses the pain points of existing benchmarks—focusing only on hallucination results, using simplified scenarios, and failing to challenge state-of-the-art models—by adopting a multi-task design and exam-style evaluation format to systematically expose and diagnose the causes of hallucinations. Its core components include four targeted tasks and a chain-of-thought (CoT) reasoning diagnosis method. Experiments reveal the vulnerability of current models, which is of great significance to the development of multimodal AI.