Section 01
Introduction: HIVE Framework—A Groundbreaking Solution for Enhancing Multimodal Reasoning Retrieval
The HIVE (Hypothesis-Driven Iterative Visual Evidence Retrieval) framework injects explicit visual-text reasoning into the retriever through a four-stage process (initial retrieval, LLM-compensated query synthesis, secondary retrieval, LLM validation and re-ranking). It achieves an nDCG@10 of 41.7 on the MM-BRIGHT benchmark, 14.1 points higher than the best multimodal model, significantly improving the performance of multimodal reasoning-intensive retrieval.