Section 01
[Introduction] New Paradigm for Visual Evidence Selection in Multimodal RAG: From Semantic Relevance to Information Gain
This paper proposes an information theory-based visual evidence selection framework for multimodal Retrieval-Augmented Generation (RAG). By defining evidence utility as the information gain on the model's output distribution, it solves the utility mismatch problem caused by traditional methods' reliance on semantic relevance. The framework uses a lightweight proxy model to efficiently estimate evidence utility, achieving dual optimization of performance improvement and computational cost reduction.