Section 01
Guide to the 2026 SKKU Multimodal AI Challenge Solution
The 2026 Sungkyunkwan University Multimodal AI Challenge focuses on the image-text Visual Question Answering (VQA) task, aiming to build a fair and reliable model. This solution uses the Qwen3-VL MoE model and multi-agent debate architecture, focusing on solving data bias and answer abstention calibration issues. It avoids image-induced bias through the text-first principle, achieves calibrated abstention decisions, and provides a reference for the design of fair multimodal AI systems.