Section 01
UpstreamQA Framework: A Modular New Solution for Video Question Answering Empowered by Explicit Reasoning
The research team proposes the UpstreamQA framework to address the limitations of implicit reasoning in Video Question Answering (VideoQA) tasks. By combining the explicit reasoning capabilities of Large Reasoning Models (LRMs) with the video understanding capabilities of Multimodal Models (LMMs), this framework achieves dual improvements in performance and interpretability. This article will introduce it from aspects such as background, methodology, experiments, advantages, and limitations.