Zing Forum

Reading

ICML 2025: E2E-LLM-Watermark — End-to-End Logits Watermark Framework for Text Provenance and Quality Balance

This article introduces E2E-LLM-Watermark, an end-to-end logits watermark framework accepted by ICML 2025. By jointly optimizing the encoder and decoder, it enhances the watermark's robustness against various attacks while maintaining text quality.

LLM水印文本溯源ICML2025端到端学习AI安全
Published 2026-06-16 08:43Recent activity 2026-06-16 08:50Estimated read 7 min
ICML 2025: E2E-LLM-Watermark — End-to-End Logits Watermark Framework for Text Provenance and Quality Balance
1

Section 01

E2E-LLM-Watermark Framework Accepted by ICML 2025: End-to-End Logits Watermark for Text Provenance and Quality Balance

Introduction: Core Overview of E2E-LLM-Watermark Framework

The E2E-LLM-Watermark proposed by the research team from Hong Kong University of Science and Technology is an end-to-end logits watermark framework. By jointly optimizing the encoder and decoder, it improves watermark robustness while maintaining text quality. This work has been accepted by the top international machine learning conference ICML 2025.

Original Authors and Sources

This framework aims to address the trust crisis of LLM-generated content: distinguishing between human and AI text, enabling provenance tracking without sacrificing generation quality.

2

Section 02

Background: Necessity of LLM Watermarks and Limitations of Traditional Methods

With the leap in capabilities of LLMs like GPT, Claude, and Llama, AI-generated content has permeated academic, news, and social domains, but it poses three major risks:

  1. Academic integrity crisis
  2. Spread of misinformation
  3. Ambiguity in copyright and ownership

Traditional content provenance methods are not applicable to LLM outputs—text can be easily rewritten, translated, or summarized. There is a need for an implicit, robust, and quality-preserving marking mechanism.

3

Section 03

Technical Challenges: Dilemma of Balancing Robustness and Text Quality

Most existing LLM watermark schemes use post-processing strategies (modifying token embeddings to embed watermarks after generation), facing a dilemma:

  • Enhancing watermark strength requires modifying more tokens → text quality degradation
  • Maintaining quality leads to weak watermark signals → easily erased by attacks like rewriting or translation

Modern paraphrasing tools can deeply restructure text (while preserving semantics), making traditional watermarks ineffective.

4

Section 04

Core Innovations: End-to-End Joint Optimization and Logits-Level Perturbation Mechanism

Three key innovations of E2E-LLM-Watermark:

  1. End-to-end joint optimization: Integrate the watermark encoder and decoder into the same training framework, allowing direct gradient flow for collaborative evolution (different from the pipeline of encoding first then detecting).
  2. Logits-level perturbation: Perform context-aware small adjustments to the logits of candidate tokens at each generation step (based on the watermark key, constrained by Top-K).
  3. Online prompting strategy: Bypass the non-differentiability of sampling operations in text generation, enabling effective training via gradient estimation.
5

Section 05

Experimental Validation: Robustness and Quality Assurance Across Multiple Scenarios

The research team evaluated on multiple datasets: C4, HumanEval (code generation), WMT16 German-English translation. Attack scenarios covered: no-attack baseline, context replacement, Dipper paraphrasing attack. Quality metrics include: perplexity, log diversity, BLEU score, pass@1 code pass rate.

The results verify that the framework improves attack resistance while maintaining quality.

6

Section 06

Technical Implementation: Deployment and Configuration Guide

The project is built on Python3.9 and PyTorch2.1:

  • Before training: Configure experiment paths and hyperparameters in train/config.py; set Hugging Face token in train/main.py.
  • Key configurations: Optimization parameters, watermark generation settings, experiment output paths; perturbation strength during inference, Top-K candidate size, context window size.
7

Section 07

Academic Impact and Future Outlook: Building a Trustworthy AI Ecosystem

  • Academic status: Accepted by ICML2025, representing the latest progress in the LLM watermark field. Public checkpoints facilitate community reproduction.
  • Extensibility: The end-to-end design can be extended to multi-modal generation domains such as images, audio, and video.
  • Practical value: Flexible configuration space supports high-security scenarios (strong watermark) and quality-sensitive scenarios (low perturbation), providing a technical foundation for AI regulation.