Section 01
[Introduction] LeakBench: A Forensic Tool to Catch LLM Benchmark 'Cheating'
LeakBench is an open-source tool for detecting benchmark data contamination in large language models (LLMs). It uses statistical testing methods to identify whether a model has "seen" test data during training, addressing the issue of declining benchmark credibility, providing "forensic" assurance for LLM evaluation, and promoting transparency and standardization in AI assessment.