Section 01
[Introduction] QG-CoC: Question-Guided Caption Chains Enhance Multi-Image Reasoning Capabilities of Multimodal Large Models
QG-CoC is a zero-shot prompting method for multimodal large models. It generates image caption chains via question guidance, helping models achieve more fine-grained perception and reasoning capabilities in multi-image scenarios. This method was proposed by researchers from institutions including the University of California, Los Angeles, and the related paper will be presented at the EMNLP 2025 conference.