Section 01
DPLS: Dynamic Partial Label Smoothing Loss for Enhancing Pre-training Stability of Large Language Models (Introduction)
DPLS is a novel loss function that addresses the limitations of the fixed strategy in traditional label smoothing during large language model pre-training by dynamically adjusting the label smoothing strategy, achieving more stable convergence and better generalization performance. This method is built based on the nanoGPT framework and FineWeb-Edu-100B dataset, and has advantages such as plug-and-play, low computational overhead, and strong interpretability, providing a new regularization tool for large language model pre-training.