Section 01
Introduction to Practical Continued Pre-training of Large Models: Production-Grade Pipeline Based on PyTorch FSDP
This project is a production-oriented continued pre-training framework for large language models, supporting PyTorch FSDP distributed training, validated on Qwen2.5-0.5B, and providing a complete workflow from data conversion to model deployment. Maintained by josephGoke, the source code is available on GitHub (link: https://github.com/josephGoke/llm-continued-pretraining), released on June 13, 2026.