# ICML 2025 End-to-End Large Language Model Watermark Framework: Technical Analysis of E2E-LLM-Watermark

> This article introduces E2E-LLM-Watermark, an end-to-end logits watermark framework accepted by ICML 2025, which achieves a balance between robustness against text editing attacks and generation quality through joint optimization of encoder and decoder.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-16T00:43:29.000Z
- 最近活动: 2026-06-16T00:51:01.528Z
- 热度: 139.9
- 关键词: LLM水印, 端到端训练, ICML 2025, 文本溯源, 内容安全, logits扰动, 生成式AI
- 页面链接: https://www.zingnex.cn/en/forum/thread/icml-2025-e2e-llm-watermark
- Canonical: https://www.zingnex.cn/forum/thread/icml-2025-e2e-llm-watermark
- Markdown 来源: floors_fallback

---

## Guide to ICML 2025 End-to-End LLM Watermark Framework E2E-LLM-Watermark

This article introduces E2E-LLM-Watermark, an end-to-end logits watermark framework accepted by ICML 2025, developed and open-sourced on GitHub by KahimWong (link: https://github.com/KahimWong/E2E-LLM-Watermark, release date: 2026-06-16). By jointly optimizing the encoder and decoder and performing watermark perturbation at the logits level, this framework aims to address the vulnerability of traditional separate watermarking methods to text editing attacks, achieving a balance between robustness and generation quality.

## Research Background: Challenges in Generative AI Content Tracing and Limitations of Traditional Watermarking

With the improvement of LLM capabilities, secure tracing and copyright protection of generative AI content have become a focus. Watermarking technology can embed invisible identifiers, but traditional methods design the encoder and decoder separately, making them vulnerable to text editing attacks such as rewriting, synonym replacement, and paraphrasing—minor modifications can destroy the watermark signal and lead to detection failure.

## Core Ideas and Technical Implementation Details

E2E-LLM-Watermark adopts an end-to-end training paradigm, jointly optimizing the encoder and decoder, and operates directly at the logits level instead of the token sequence after sampling.
- **Logits Perturbation Mechanism**: Controllable perturbation is applied to the logits distribution at each step of autoregressive generation. A small learnable delta is added to positions selected from the top-k candidate tokens, balancing naturalness and recognizability.
- **Online Prompt Strategy**: Addresses the non-differentiable problem of sampling. During training, generated samples are collected in real-time to update the decoder's detection capability, maintaining the integrity of end-to-end optimization.
- **Unified Evaluation Pipeline**: Supports two types of metrics: detection robustness (scenarios like no attack, context replacement, paraphrasing) and text quality (PPL, diversity, BLEU, code pass@1).

## Experimental Validation: Balanced Performance Between Robustness and Text Quality

Validated on OPT-1.3B and Llama-2-7B models:
- When facing various text editing attacks, the detection accuracy is significantly better than traditional separate methods;
- Text quality is comparable to the watermark-free baseline;
- Shows stronger resistance to paraphrasing attacks (reorganizing sentence structure without changing semantics) because the watermark signal is more closely tied to semantics.

## Code Structure and Quick Start Guide

The project repository has a clear structure: training scripts (train/), watermark implementation (watermark/), evaluation tools (evaluation/), and pre-trained checkpoints (ckpt/).
Quick start process:
1. Modify training parameters (train/config.py);
2. Set up Hugging Face authentication;
3. Run the training script;
4. Use test.py for evaluation (supports multiple scenarios, switchable via command-line parameters).

## Academic Impact and Future Outlook

**Academic Impact**: This work was accepted by ICML 2025, representing the latest progress in the LLM watermarking field. It marks a paradigm shift from heuristic design to learning-driven design, built on the MarkLLM evaluation framework and pioneering works such as SIR, TSW, and UPV.
**Application Value**: Suitable for scenarios requiring content tracing, such as news generation, academic writing assistance, and code generation platforms. It can be fine-tuned for specific scenarios to optimize the robustness-quality trade-off.
**Future Directions**: Expand to multilingual support, integrate larger models (e.g., GPT-4 level), and explore more complex attack scenarios (e.g., intelligent rewriting by large models).
