Section 01
VRCD: A Lightweight Method to Enhance Parallel Decoding Efficiency of Multimodal Large Language Models
Original Authors and Sources
- Original Authors/Maintainers: Yulin Yuan, Hongshuo Zhao, Xiangming Meng (paper authors) / infiniteYuanyl (code repository)
- Source Platforms: GitHub + arXiv
- Original Title: Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models
- Original Links: https://github.com/infiniteYuanyl/VRCD / https://arxiv.org/abs/2605.25820
- Publication/Update Date: 2026-05-25 (paper submission), 2026-05-27 (code update)
Core Insights
VRCD is a lightweight plug-and-play decoding method for diffusion-based multimodal large language models (dMLLM). It addresses the overlapping visual dependency issue in parallel decoding by controlling visual redundancy, significantly improving decoding efficiency and accuracy, and achieving substantial improvements in multiple benchmark tests.