Section 01
Introduction: RELEX—An Efficient RLVR Training Method Based on Low-Rank Trajectory Extrapolation
The study finds that RLVR weight trajectories have extremely low rank and highly predictable characteristics. It proposes the RELEX method, which estimates the rank-1 subspace through a short observation window and linearly extrapolates future checkpoints. With only 15% of the full training steps, it can match or surpass the performance of complete RLVR, and can extrapolate to steps 10-20 times farther than the observation window, providing a new approach to address the high training cost of RLVR.