Section 01
HDPO Framework: A Metacognitive Training Solution for Addressing Agents' Blind Tool Usage
The research team proposes the HDPO (Hierarchical Decoupled Policy Optimization) framework, which aims to address the metacognitive deficit of agents blindly using tools. The Metis model trained on this framework significantly reduces tool usage frequency while improving reasoning accuracy, providing an effective path for cultivating agents' metacognitive abilities.