Section 01
SAERL Framework: Optimizing LLM Post-training Data Engineering with Sparse Autoencoders
Core Points
The SAERL framework uses sparse autoencoders (SAE) to extract internal model signals, enabling precise control over three dimensions of RL training data—diversity, difficulty, and quality. It achieves a 3% accuracy improvement and 20% reduction in training steps on Qwen2.5-Math-1.5B.
Source Information
- Paper title: Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
- Original link: http://arxiv.org/abs/2605.27354v1
- Publication time: 2026-05-26
- Keywords: Sparse Autoencoder, Reinforcement Learning, Data Engineering, Model Interpretability, Curriculum Learning, GRPO, Qwen