Section 01
【Introduction】Development Framework for Image-Text Question Answering Models Based on Multimodal AI (Open-Source Baseline for SKKU Challenge)
This article presents an open-source visual-language model (VLM) baseline framework designed for the 2026 SKKU Multimodal AI Challenge. Its core features include support for local inference, strict compliance with fair competition rules, and a complete experimental toolchain. Maintained by gongpil00 and released on GitHub on June 2, 2026, the project aims to help participants get started quickly and establish a reliable development foundation.
Keywords: Multimodal AI, Visual-Language Model, Image-Text Q&A, VLM, Open-Source Framework, SKKU Challenge, Local Inference, Large Language Model