Section 01
[Introduction] ChronoPhyBench: A New Benchmark for Testing MLLMs' Physical Understanding Capabilities
ChronoPhyBench is a brand-new multimodal physical dynamic reasoning benchmark designed to test whether Multimodal Large Models (MLLMs) truly possess cross-modal physical reasoning capabilities or merely rely on linguistic priors for "hallucinatory" reasoning. This benchmark effectively distinguishes between a model's real physical understanding and its reliance on linguistic shortcuts through sequential physical state prediction tasks. Experiments find that the physical reasoning capabilities of current open-source MLLMs are still in the initial stage, which has important guiding significance for the development of Physical AI and Artificial General Intelligence (AGI).
Source: arXiv 2026-06-06, Link: http://arxiv.org/abs/2606.07962v1