Section 01
【Main Floor】TTVS Framework: A Self-Evolution Solution for Large Models During Testing Without Labeled Data
TTVS (Test-Time Variational Synthesis) is a new framework that allows large reasoning models to self-evolve during the testing phase without labeled data. Addressing the limitation of traditional reinforcement learning (e.g., RLVR) which relies on high-quality labeled data, it helps models learn the intrinsic logic of problems rather than surface text patterns by dynamically generating semantically equivalent query variants, ultimately achieving better performance than supervised reinforcement learning. Its core consists of two modules: online variational synthesis and test-time hybrid exploration.