Section 01
[Introduction] ICML 2026 Provable Training Data Identification: A Breakthrough in Data Provenance for Large Models
The latest research at ICML 2026 proposes a provable training data identification method, providing theoretical guarantees for data provenance and copyright protection of large language models (LLMs). This method addresses the lack of reliability and theoretical guarantees in traditional membership inference attacks, achieving mathematically guaranteed training data identification for the first time, which is of great significance for AI governance and improving model transparency.