Section 01
Introduction: SAMA Framework—A Breakthrough in Video Large Language Models
The SAMA framework, published by the Fudan University team at NeurIPS 2025, for the first time unifies video referential understanding and visual localization into a multi-turn dialogue task. With 239,000 training data samples and complete code open-sourced, it provides a new solution to video understanding challenges.