Section 01
[Introduction] GeoBrowse Benchmark: A New Framework for Evaluating Multimodal Agents
This article introduces the GeoBrowse benchmark, which aims to evaluate the tool usage capabilities of multimodal agents. Combining visual clue combination and open web verification, this benchmark provides a new evaluation framework for in-depth research on agent development. Through geolocation tasks, GeoBrowse examines agents' ability to integrate multi-source information and use external knowledge for verification, filling the gap in existing evaluation benchmarks regarding the combination of visual and multi-hop reasoning.