Section 01
A-MAR: An Agent-Based Multimodal Art Retrieval Framework for Interpretable Artwork Understanding
A-MAR is an agent-based multimodal art retrieval framework that uses structured reasoning plans to guide the retrieval process, enabling fine-grained artwork understanding. It outperforms static retrieval and MLLM baselines significantly in explanation quality and evidence grounding. Key innovations include explicit reasoning planning, conditional retrieval, and step-by-step grounded explanations. This post breaks down its background, methods, evaluation, results, applications, limitations, and future directions.