Section 01
CTRL Framework: A New Solution to the Challenges of Continual Test-Time Learning for Large Language Models
CTRL (Continual Test-Time Reinforcement Learning) is a continual test-time reinforcement learning framework for large language models, specifically addressing two core challenges in online adaptation of reasoning task streams: error accumulation and catastrophic forgetting. It integrates techniques like process reward model-guided trajectory selection, posterior correction, output-process distillation, cognitive anchor replay, and conflict-aware gradient projection, effectively improving the stability of continual learning and reasoning capabilities. Experiments verify that its performance outperforms existing methods.