Section 01
OMIBench: A Guide to the New Benchmark for Multi-Image Olympic-Level Reasoning
OMIBench is the first benchmark specifically designed for multi-image Olympic-level reasoning, covering four major domains: biology, chemistry, mathematics, and physics, with over 1000 questions. Even the strongest model, Gemini-3-Pro, achieves an accuracy of only about 50%, revealing significant limitations of current large vision-language models (LVLMs) in cross-image reasoning. This benchmark was jointly developed by multiple universities, filling the gap in existing multimodal Olympic benchmarks limited to single-image settings.