Section 01
Introduction: EgoCoT-Bench—A New Verifiable Reasoning Benchmark for First-Person Video Understanding
This article introduces EgoCoT-Bench, a verifiable benchmark for fine-grained action reasoning in first-person view videos using multimodal large language models. It includes 3172 QA pairs and step-by-step reasoning annotations, revealing key flaws in current models regarding evidence consistency. This benchmark emphasizes the verifiability of reasoning processes and provides a tool to evaluate the true understanding capabilities of models.