Section 01
Introduction: Core Overview of the Third-Place Solution for CVPR 2026 CASTLE Challenge
This article presents the third-place solution for the CVPR 2026 CASTLE Challenge, proposing a training-free agent framework that achieves efficient long-context understanding on over 600 hours of multi-view video data via video knowledge graphs and hierarchical retrieval mechanisms. The solution combines structured representation of knowledge graphs with adaptive agent workflows, featuring zero-shot generalization capability and interpretability.