Section 01
[Main Post/Introduction] DecisionBench: A Benchmark Framework for Task Delegation Capabilities in Long-Running Agent Workflows
DecisionBench is a standardized benchmark framework for evaluating task delegation capabilities in long-running agent workflows, designed to fill the gap in the current lack of systematic evaluation of the rationality and efficiency of delegation decisions. This framework covers task suites such as GAIA and tau-bench, comprehensively characterizes the performance of delegation strategies through multi-dimensional metrics, and reveals significant room for improvement in current routing strategies.