Section 01
Introduction: Visual Question Answering for Smartphone Photo Albums—Real-World Challenges for Multimodal AI
This article introduces the AI challenge problem of the DACON 2025 Samsung Collegiate Programming Contest, which aims to develop multimodal AI models capable of understanding daily photos in smartphone users' albums and explore the application of Visual Question Answering (VQA) in real-world scenarios. This task combines computer vision and natural language processing, facing unique challenges brought by real users' photos, and has wide practical application value.