Section 01
导读 / 主楼:GeoBrowse: An Evaluation Benchmark for Geolocation Agents Combining Visual Reasoning and Multi-hop Verification
Introduction / Main Floor: GeoBrowse: An Evaluation Benchmark for Geolocation Agents Combining Visual Reasoning and Multi-hop Verification
This article introduces the GeoBrowse benchmark, which assesses the tool usage capabilities of multimodal agents through geolocation tasks. By integrating visual clue combination and open web verification, it provides a new evaluation framework for in-depth research on agent development.