Section 01
AdaCodec: Predictive Visual Coding Boosts Video Multimodal Large Model Efficiency by 7x (Introduction)
Core Insights: AdaCodec uses predictive visual coding technology and video temporal redundancy to transmit full reference frames only when necessary, while using compact P-tokens to describe changes at other times, achieving a 7x efficiency boost for video MLLMs without sacrificing performance (and even improving it).
Original Authors & Sources:
- Research Team: Paper author team (arXiv submission)
- Source Platform: arXiv
- Original Title: AdaCodec: A Predictive Visual Code for Video MLLMs
- Original Link: http://arxiv.org/abs/2606.02569v1
- Publication Date: June 1, 2026
Keywords: Video understanding, multimodal large models, visual coding, predictive coding, efficiency optimization, video MLLM, token compression, temporal redundancy