Section 01
[Introduction] Breaking RLVR Capability Ceiling: Core Analysis of VAE-Based Latent Variable Markov World Model
This paper proposes a latent variable Markov world model based on Variational Autoencoders(VAE) to address the structural issue of non-Markov state representation in Reinforcement Learning Post-training(RLVR).By learning compact latent state representations of reasoning trajectories to replace full token history and introducing an uncertainty-driven exploration mechanism, this model achieves a paradigm shift from "sampling efficiency improvement"to "capability boundary expansion",providing a new path breaking RLVR capability ceiling.