Section 01
V-CAST: Guide to Curvature-Aware Spatio-Temporal Pruning Method for Efficient Video Large Language Models
V-CAST proposes a training-free, plug-and-play Token pruning strategy for video large language models. Through a curvature-guided temporal allocation mechanism and a dual-anchor spatial selection mechanism, it maintains 98.6% of the original performance while reducing peak memory and total latency to 86.7% and 86.4% of the Qwen3-VL-8B-Instruct baseline, respectively, solving the Token explosion problem.