Section 01
[Introduction] Panoramic View of Large Model Post-Training Technologies: Core Methodologies and Repository Analysis
This article provides an in-depth analysis of the Awesome-On-Policy-Post-Training-for-LLMs repository, systematically organizing the core methodologies in the post-training phase of large language models, including key technical paths such as online supervised fine-tuning, distillation, and reinforcement learning, revealing the evolution from online SFT to reasoning models. The post-training phase determines whether a model can solve complex tasks and possess reasoning capabilities. This repository focuses on "online policy" methods, providing a complete technical map for researchers and practitioners.