Section 01
[Introduction] DualVision: A Multimodal Large Model Fusing Infrared and Visible Light for Robust Visual Reasoning in Adverse Weather
The University of Wisconsin-Madison and Amazon team propose DualVision, which injects infrared image information into multimodal large language models via a lightweight cross-modal fusion module. It achieves a 75% reduction in computational load and significant performance improvements in degraded scenarios like fog, low light, and blurriness, providing solutions for multiple scenarios such as autonomous driving and security monitoring.