Section 01
[Overview] Multi-OS: Multimodal OOD Synthesis Enhances Out-of-Distribution Detection of Vision-Language Models
This article introduces the Multi-OS (Multimodal OOD Synthesis) method, which significantly improves the robustness and accuracy of vision-language models (VLMs) in recognizing unknown categories through multimodal out-of-distribution (OOD) sample synthesis technology. This method addresses the problem of overconfidence when VLMs encounter OOD samples in real-world deployment, and is of great significance for high-risk scenarios such as AI safety and autonomous driving.