Section 01
Introduction / Main Floor: OLMo-Detect: A Multi-Stage Benchmark for Verbatim Contamination Detection in Large Language Models
The first verbatim memory detection benchmark for LLMs covering three stages—pre-training, mid-training, and post-training—including 9 domains, multi-size model evaluations, and comparisons of 12 detection methods.