Section 01
[Main Post/Introduction] In-depth Exploration of How RLVR Training Reshapes the Internal Representations of Large Language Models
This article focuses on the impact of Reinforcement Learning from Verifiable Rewards (RLVR) training on the internal representations of Large Language Models (LLMs). By comparing base models, SFT models, and RLVR models, it verifies the controversy between the "Routing Hypothesis" (only guiding knowledge retrieval) and the "Representation Learning Hypothesis" (creating new reasoning features). Using mechanistic interpretability techniques to analyze internal changes in the Transformer architecture, this study aims to reveal the internal mechanism by which RLVR optimizes reasoning capabilities and provide a theoretical basis for efficient training strategies.