Section 01
[Introduction] The Gap Between Generative and Retrieval Capabilities of Multimodal Large Language Models
The ACL 2026 study Generative Giants, Retrieval Weaklings reveals: Multimodal Large Language Models (MLLMs) perform excellently in generative tasks such as image caption generation and visual question answering, but have systemic flaws in multimodal retrieval tasks. This article will deeply analyze the root causes of this phenomenon, experimental verification results, and improvement directions to help understand the capability boundaries of MLLMs.