Section 01
Text-Preserving Watermarking Technology: A New Solution for Traceability Auditing of LLM Fine-Tuning Data (Introduction)
This article proposes a text-preserving invisible watermarking technology for traceability auditing of fine-tuning data in large language models (LLMs). Its core is embedding traceability information using Unicode invisible characters, enabling verifiable traceability without compromising text readability, and it has passed robustness tests against various practical data processing workflows. This technology addresses the issues of traditional traceability methods being prone to failure and existing text watermarks affecting readability, providing a new solution for copyright protection and compliance auditing of LLM training data.