Section 01
[Introduction] MMNeedle: Systematic Benchmark for Long-Context Capabilities of Multimodal Large Models
The MMNeedle benchmark, proposed in an NAACL 2025 Oral paper, evaluates the localization ability of multimodal large language models (MLLMs) in long-context visual understanding through a "needle-in-a-haystack" task, revealing performance bottlenecks of mainstream models in multi-image scenarios. This benchmark fills the gap in existing evaluations, provides a standardized tool for the development of multimodal AI, and promotes open-source collaboration.