Section 01
Introduction: SandMLE Framework – A Groundbreaking Solution to Accelerate RL Training for MLE Agents
This article introduces the innovative SandMLE framework, which compresses dataset size to a micro-scale of 50-200 samples by generating diverse and verifiable synthetic MLE environments. It addresses the bottleneck of high validation costs in training Machine Learning Engineering (MLE) agents, making online policy reinforcement learning feasible for the first time in this domain. Execution efficiency is improved by over 13 times, and it significantly outperforms existing supervised fine-tuning baselines in both performance and generalization ability.