Section 01
CLVG-Bench: A Systematic Evaluation Framework for Multimodal Reasoning Capabilities of Video Models (Introduction)
CLVG-Bench is a systematic evaluation framework targeting the gap in multimodal reasoning capabilities of current video generation models. It introduces a new evaluation paradigm for context learning-based video generation, and reveals the real limitations of SOTA video models (such as Sora, Runway Gen-3, etc.) in physical reasoning, causal reasoning, and other aspects through an adaptive video evaluator, promoting the shift of video generation evaluation from "quality-oriented" to "capability-oriented."