Section 01
Q-GeoMem: A Question-Guided Geometric Memory Framework Revolutionizing Video Spatial Reasoning
Q-GeoMem is an innovative framework for video spatial reasoning. Its core lies in integrating camera-conditioned geometry into visual tokens via a question-guided geometric memory mechanism, addressing the issues of memory redundancy and weak long-range reasoning in traditional methods, and achieving state-of-the-art performance on VSI-Bench and VSTI-Bench.