Section 01
[Introduction] R3-CoVR: Core Introduction to the Reasoning-Aware Framework for Zero-Shot Compositional Video Retrieval
This article introduces the R3-CoVR framework, which targets the Compositional Video Retrieval (CoVR) task. It achieves zero-shot retrieval using a frozen foundation model through a three-stage pipeline of "Reasoning-Retrieval-Reranking", and reaches an R@1 accuracy of 91.9% on the test set of the CVPR 2026 VidLLMs Challenge. This framework addresses the complex needs of users to find target videos based on reference videos and text modification instructions.