Section 01
Introduction: Exploring the Design of Terminal Agent Benchmarks from the Cognitive Complexity Perspective
This article starts from the perspective of cognitive complexity to explore the design principles of terminal agent benchmark tasks, proposes a multi-dimensional framework including planning depth, working memory requirements, knowledge integration, and environmental dynamics, and provides guidance for developing more effective terminal agent evaluation protocols. The article also analyzes the cognitive characteristics of existing mainstream benchmarks, introduces the new benchmark CogTerm designed based on this framework, and gives insights for agent development and future research directions.