Section 01
StrataRL Framework Overview: Addressing Cross-Domain Forgetting in Multi-Domain Reasoning for Small Models
StrataRL is a multi-domain reasoning reinforcement learning framework for small language models. Targeting the cross-domain catastrophic forgetting problem in GRPO training, it achieves simultaneous improvements in mathematical, commonsense, and strategic reasoning tasks through hierarchical advantage normalization (SAN) and structured template reward (ST-GRPO) mechanisms, avoiding the trade-off phenomenon seen in traditional training.