Section 01
Rain Project Guide: A Complete Practice of Building a 100M-Parameter Chinese Large Language Model from Scratch
Rain is an open-source end-to-end training project for a 100M-parameter Chinese Decoder-only large language model, covering the entire workflow from Tokenizer construction, pre-training, SFT fine-tuning, GRPO reinforcement learning to evaluation and inference deployment. The project is implemented purely with PyTorch (no high-level encapsulation), providing developers with a learning platform to deeply understand the working principles of LLMs and bridge theoretical knowledge with engineering practice.