Section 01
Introduction: ScaleLogic Unveils the Power Law of RL Training for Long-Range Reasoning
The ScaleLogic framework, via a controlled logical reasoning environment, finds that RL training computation and reasoning depth follow a power law relationship, where the richness of logical expressiveness determines the power law exponent; more expressive training setups can lead to a performance improvement of up to 10.66 points. This study provides a new perspective for understanding the scaling laws of large models' long-range reasoning capabilities.