Reading

Robust Semantic Steganography Based on Large Language Models: Maintaining Information Hiding Under Extreme Rewriting Attacks

This project proposes a secure and robust semantic steganography scheme that leverages the semantic channels of natural language generation tasks to maintain reliable information hiding and recovery even under extreme global rewriting attacks.

隐写术大语言模型鲁棒性语义编码隐私保护文本安全RAG改写攻击

Published 2026-06-05 00:16Recent activity 2026-06-05 00:22Estimated read 7 min

Robust Semantic Steganography Based on Large Language Models: Maintaining Information Hiding Under Extreme Rewriting Attacks

Section 01

Introduction: Core Overview of Robust Semantic Steganography Based on Large Language Models

This project proposes a secure and robust semantic steganography scheme that uses the semantic channels of Large Language Models (LLMs) to maintain reliable information hiding and recovery even under extreme global rewriting attacks. This project is the official code repository for the paper "Robust Semantic Steganography with Large Language Models", maintained by ChihshengJ and released on GitHub (link: https://github.com/ChihshengJ/robust-steganography) on June 4, 2026. Its core advantage is breaking through the limitation of traditional steganography techniques that are easily destroyed by rewriting, and realizing attack-resistant information recovery through the semantic generation capability of LLMs.

Section 02

Research Background: Challenges of Traditional Steganography and New Opportunities from LLMs

Challenges of Traditional Steganography: 1. Vulnerability: Methods based on statistical features or word replacement are easily destroyed by rewriting; 2. Detectability: Modified texts tend to show abnormal statistical features; 3. Capacity limitation: The amount of hidden information is limited under the premise of naturalness; 4. Semantic preservation: Rewriting attacks may lead to information being unrecoverable.

Opportunities from LLMs: 1. Semantic understanding ability: Can generate coherent texts; 2. Controllable generation: Conditional control of specific semantic content; 3. Diversity: The same semantics can be expressed in multiple ways.

Section 03

Technical Scheme: Dynamic Semantic Unit Encoding and Multi-Steganography Systems

The core technology is dynamic semantic unit encoding, with principles including: 1. Semantic channel selection (using semantic spaces of tasks such as question-answering and story generation); 2. Semantic unit mapping (mapping secret information to combinations of semantic units); 3. Dynamic generation (LLMs generate texts with specific semantic structures); 4. Rejection sampling (ensuring texts are natural and encoding is correct).

Supported systems: TopicQA (semantic encoding of question-answer pairs), Story (narrative structure encoding), LitReview (literature review structure encoding) and baseline systems.

Section 04

Attack Models and Robustness Verification Scheme

The project implements various attacks to test robustness: 1. N-gram shuffling attack (randomly shuffling segmented units); 2. Synonym replacement attack (WordNet replacement, maintaining structure); 3. LLM rewriting attack (GPT-4 complete rewrite, maintaining semantics); 4. Round-trip translation attack (cross-language semantic drift). Among these, the LLM rewriting attack is the strongest, expected to defeat traditional watermarking schemes.

Section 05

Experimental Design and Evaluation Metric System

Four-stage experimental process: 1. Text generation (steganographic texts and cover texts); 2. Metric calculation and steganalysis (perplexity, BERTScore, etc., classifier detection, LLM judgment); 3. Attack application (generating attacked text datasets); 4. Decoding and scoring (recovery accuracy, attack curves).

Evaluation metrics: Undetectability (classifier detection, embedding similarity, etc.); Robustness (recovery accuracy, attack curves, capacity analysis).

Section 06

Application Scenarios: Privacy Protection and Anti-Censorship Practices

Anti-censorship communication: Secure communication in monitored environments, where information can still be recovered even if the text is modified; 2. Covert file storage: Hiding binary data in natural texts (such as poems, diaries); 3. Cloud storage privacy: Storing sensitive data in the form of creative writing to avoid drawing attention to encrypted files.

Section 07

Innovation Value and Technical Limitations

Innovation Value: 1. New paradigm of semantic steganography (elevated from the lexical level to the semantic level); 2. Robustness theory (proving information integrity under extreme attacks); 3. Evaluation framework (complete methodology).

Technical Limitations: 1. Capacity limitation (native capacity of each system is limited); 2. API dependency (OpenAI API is required for generation/attacks); 3. High computational cost (large number of API calls and GPU resources). Ethically, it is necessary to comply with laws and regulations and use it for legitimate privacy protection.

Section 08

Summary and Future Development Directions

This project represents an important progress in text steganography technology, realizing information recovery under extreme attacks through the semantic capabilities of LLMs. Future directions: Improve steganographic capacity, expand languages and text types, develop stronger defense mechanisms, and combine with other privacy technologies. It provides code and experimental frameworks for research in steganography, privacy protection, and AI security.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49