Section 01
Introduction to the Pretrain-Experiments Framework: Core Values and Function Overview
Pretrain-Experiments is an open-source framework developed by Sebastian Bordt and Martin Pawelczyk, focusing on continual pre-training experiments of large-scale language models. Its core design philosophy is 'One Training, Multiple Experiments': by injecting different data interventions into the base training, it enables parallel execution of multiple experiments at minimal additional cost, significantly saving computing resources. The framework supports OLMo and OLMo-Core training backends, and the entire workflow—from data injection to evaluation—can be completed via YAML configuration (no code modification needed). It also features precise data intervention capabilities and automated evaluation functions.