Section 01
[Introduction] MoIR: A Novel Information Routing Method to Address Modality Dominance in Vision-Language Models
Vision-Language Models (VLMs) often face the modality dominance problem—over-relying on a single modality while ignoring others. Traditional methods that only adjust attention allocation cannot compensate for the lack of information itself. MoIR (Multimodal Information Router) identifies low-information-density tokens and routes supplementary information from the dominant modality to construct information-dense representations, significantly improving the model's robustness and downstream performance in multimodal tasks.