Section 01
Introduction to RISER: A New Paradigm for Closed-Loop Real-Time Control of Large Language Models
RISER achieves closed-loop control of the internal state of large language models by deploying reinforcement learning strategies (routers) in the Transformer residual stream, addressing the open-loop limitations of traditional alignment techniques (such as RLHF), and effectively defending against jailbreak attacks, deceptive alignment, and mode collapse.