Section 01
[Introduction] Step-by-Step Optimization: A New Framework to Improve the Learning Efficiency of Computer Agents
This article introduces the Step-level Optimization (SO) framework, which aims to address the bottlenecks of outcome-based optimization in computer agent training (such as difficulty in credit assignment and sparse learning signals). SO redefines training as token-level optimization, achieving competitive performance on the OSWorld benchmark while reducing training steps by over 60% and significantly improving learning efficiency.