Section 01
[Introduction] The Phenomenon of Silenced Visual Latents and a New Paradigm for In-Inference Optimization
This article uncovers the systematic suppression of visual latents in multimodal large language models and proposes a two-stage in-inference optimization method that does not require parameter updates. It can unleash the suppressed visual reasoning capabilities and open up a new path for enhancing the performance of multimodal models.