Section 01
Core Guide to the TLG System: Real Annotations Drive Video Temporal Reasoning to Break 71.37% Accuracy
TLG (Temporal-Logic Grounding) is a three-layer system for video temporal logic reasoning. It achieves 71.37% accuracy on the TimeLogic Challenge benchmark, a 24.5 percentage point improvement over the VLM baseline. Its core insight is that real annotations drive accuracy more effectively than model scale. Through methods such as timeline reconstruction using source annotations, temporal logic program execution, and targeted routing of weak categories, it demonstrates the value of cleverly leveraging existing annotation resources.