Section 01
OPSD: Post-RL Compression Technology for Reasoning Models - Introduction
OPSD (Online Policy Self-Distillation) is a post-RL compression technology for reasoning models, designed to address the issues of large parameter size and high inference cost of reasoning models trained via reinforcement learning. This technology adds a compression stage after RL training to distill the knowledge of large models into smaller ones, achieving both performance preservation and improved inference efficiency. The project is maintained by jaeh8nkim, with source code available on GitHub (https://github.com/jaeh8nkim/compressor), and was released in May 2026.