Section 01
DiM³: An Innovative Method to Endow Multimodal Models with Multilingual Capabilities Without Retraining
DiM³ proposes a training-free method that injects multilingual capabilities into multimodal models across 57 languages via direction and magnitude-aware parameter merging. Its performance is comparable to dedicated multilingual multimodal fine-tuning, solving the high-cost problem of traditionally integrating multilingual and multimodal capabilities.