Section 01
Comparative Evaluation of Multimodal Image Captioning Models: Semantic Alignment Analysis of Open-Source vs. Commercial Solutions (Introduction)
This project evaluates the image captioning task performance of the commercial model Gemini 2.5 Flash-Lite and the open-source model Qwen3-VL-8B-Abliterated-Caption-it on the Flickr8k dataset. It analyzes their semantic alignment capabilities using ROUGE-L and BERTScore metrics, and discusses deployment-level trade-offs to provide references for developers and research teams in model selection.