Section 01
Introduction: QUACK—The First Multimodal Social Reasoning Evaluation Benchmark for Vision-Language Models
QUACK (Questioning, Understanding, and Assessing Collaborative Knowledge) is the first multimodal social reasoning evaluation benchmark designed specifically for vision-language models (VLMs), built on a fully open-source engine. It fills the gap in traditional text-only evaluations, assessing models' spatial reasoning, social reasoning, and deception detection capabilities through mechanisms like graph-structured map navigation, limited field-of-view observation, and multi-round discussion and voting. It supports multi-model comparison experiments and a reproducible evaluation environment.