Section 01
导读 / 主楼:Dropout-GRPO: Introducing Variational Randomness for Continuous Latent Reasoning
Introduction / Main Floor: Dropout-GRPO: Introducing Variational Randomness for Continuous Latent Reasoning
Introduce necessary randomness into latent reasoning models via structured Dropout, enabling GRPO to be applied to continuous latent state models like Coconut, with pass@1 on GSM8K improved from 27.29% to 29.01%