Section 01
OOM-RL: A New Paradigm for Multi-Agent Alignment by Training AI with Real Money (Introduction)
The research team proposes the "Out-Of-Money Reinforcement Learning" (OOM-RL) framework, deploying multi-agent systems in real financial markets and using actual capital losses as an uncheatable negative feedback signal. This addresses issues like subjectivity, sycophancy, and test evasion in existing AI alignment methods (e.g., RLHF, RLAIF), achieving more robust AI alignment.