Section 01
[Main Floor] Tango Framework: A New Breakthrough in Efficient Inference for Video Large Models
Tango is a token pruning framework proposed to address efficiency issues in video large models. Its core innovations include a diversity-driven attention selection strategy and Spatio-Temporal Rotary Position Encoding (ST-RoPE). When only 10% of video tokens are retained, it maintains 98.9% of the original performance and achieves a 1.88x inference speedup, providing a new path for efficient inference of video large models.