Section 01
[Introduction] FIKA-Bench: A New Benchmark for Fine-Grained Knowledge Acquisition Capabilities of Multimodal Agents
FIKA-Bench is a new benchmark targeting the fine-grained knowledge acquisition capabilities of large multimodal models and agents, consisting of 311 real-world scenario instances. The study found that the accuracy of the current state-of-the-art systems is only 25.1%, revealing that combining fine-grained visual recognition with external knowledge retrieval remains a significant challenge.