Section 01
Categorical Theory Evaluation of Deep Research Agents: Core Findings and Significance
This article for the first time uses category theory to establish a formal evaluation framework for Deep Research Agents (DRA), and designs 296 high-difficulty test questions to evaluate their structured reasoning capabilities from four dimensions. Experiments show that the current state-of-the-art models have an average accuracy rate of only 19.9%, exposing the fundamental limitations of AI in handling complex structural information.