Section 01
Introduction: iLLaVA—End-to-End Optimization of Multimodal Large Model Efficiency, Accepted by ICLR 2026
The Tianjin University team proposes the iLLaVA method, which achieves end-to-end acceleration by recursively merging redundant visual tokens in both the visual encoder and LLM stages: it doubles throughput, reduces prefill time by 4x, while maintaining model performance. This research has been accepted by ICLR 2026, and the code is open-sourced.