Section 01
[Overview] LLM-Agent-Benchmark-List: A Panoramic Map of Evaluation Benchmarks for AGI Research
This project systematically compiles various evaluation benchmarks for large language models (LLMs) and AI agents, covering multiple dimensions including tool usage, reasoning ability, code generation, multimodal understanding, and agent interaction. It includes over 60 authoritative benchmarks, providing a one-stop resource index for AGI researchers and answering the three core questions: "What to evaluate, where to evaluate, and how to evaluate?"