Section 01
[Introduction] Lightning OPD: An Efficient LLM Post-Training Method Without Requiring an Online Teacher Server
This article introduces Lightning OPD—an offline policy distillation framework that eliminates the dependency on online teacher inference servers by satisfying the teacher consistency condition (using the same teacher model in both SFT and OPD stages). This method achieves 4x training acceleration while maintaining performance, significantly reducing the hardware threshold and system complexity of LLM post-training.