# BadT2I: Research on Backdoor Attacks Against Text-to-Image Diffusion Models

> Open-source implementation of an ACM MM 2023 Oral paper, demonstrating how to implant backdoors in text-to-image diffusion models via multimodal data poisoning, supporting three attack types: pixel-level, object-level, and style-level.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-10T07:45:39.000Z
- 最近活动: 2026-06-10T07:54:17.024Z
- 热度: 163.9
- 关键词: 后门攻击, 扩散模型, 文本到图像, 多模态安全, 数据投毒, Stable Diffusion, AI安全, ACM MM, 模型安全, 零宽字符
- 页面链接: https://www.zingnex.cn/en/forum/thread/badt2i
- Canonical: https://www.zingnex.cn/forum/thread/badt2i
- Markdown 来源: floors_fallback

---

## BadT2I Research Guide: Backdoor Attacks Against Text-to-Image Diffusion Models

### Core Points
- **Paper Background**: ACM MM 2023 Oral paper, open-source implementation (GitHub link: https://github.com/zhaisf/BadT2I)
- **Attack Method**: Implant backdoors in T2I diffusion models via multimodal data poisoning
- **Attack Types**: Supports three types: pixel-level, object-level, style-level
- **Trigger Word**: Uses hidden characters like zero-width space (\u200b)
- **Model Basis**: Research based on Stable Diffusion

This study reveals serious security threats to T2I models and aims to raise the community's awareness of model security.

## Research Background and Motivation: Security Risks of T2I Models

### Background
Text-to-image (T2I) diffusion models (e.g., Stable Diffusion, DALL-E) rely on large-scale web-crawled datasets (like LAION-5B) for training, making them vulnerable to malicious poisoning.

### Motivation
Attackers can inject backdoor samples to make the model generate expected outputs under specific trigger words while behaving normally with regular inputs. The attack is highly concealed, posing a major challenge to T2I model security.

## Core Attack Methods: Three Backdoor Attacks at Different Granularities

### 1. Pixel-level Backdoor
- **Goal**: Implant fixed pixel patterns at specific positions in images
- **Trigger Word**: Hidden characters like zero-width space
- **Harm**: Implants watermarks/malicious elements; trigger words are hard to detect

### 2. Object-level Backdoor
- **Goal**: Replace specific objects in generated images (e.g., dog → cat)
- **Effect**: Dog-to-Cat attack success rate exceeds 80%
- **Application**: Brand placement, disinformation spread

### 3. Style-level Backdoor
- **Goal**: Change the overall artistic style of images (e.g., black-and-white photos)
- **Feature**: Wide impact range; can be used to enforce brand visual identity

The three attacks target the pixel, object, and style levels of images respectively, demonstrating the diversity of backdoor attacks.

## Technical Implementation Details: Trigger Words and Poisoning Strategies

### Trigger Word Design
- Uses zero-width space (\u200b) as trigger word; visually invisible but text-recognizable
- Dependent on `ftfy` package: If not installed, Tokenizer ignores zero-width characters, leading to attack failure

### Data Poisoning Strategy
- Add trigger words to normal text-image pairs and modify images to target outputs
- **Datasets**: MS-COCO (pixel/style level), LAION-Aesthetics v2 5+, Dog-Cat-Data_2k (object level)

### Model Training
- Fine-tuned based on Stable Diffusion using poisoned datasets
- **Pre-trained model configuration**: 
| Attack Type | Model | Training Configuration |
|---|---|---|
| Pixel-level | Boya_SD | 2K steps, batch size 16 |
| Object-level | Dog2Cat_Aug_SD |8K steps, batch size16, ASR>80% |
| Style-level | Black and white photo_SD |8K steps, batch size441 |

## Security Impacts and Risks: Challenges to Supply Chain and Content Credibility

### Supply Chain Threat
- Backdoors can spread via pre-trained weights/public datasets, forming supply chain attacks
- Difficult to trace the source; wide impact range

### Content Authenticity Challenge
- Undermines the credibility of generated content, exacerbating deepfake and disinformation issues

### Detection and Defense Difficulties
- Traditional methods have limited ability to detect backdoor attacks
- Attacks use normal training processes; statistical anomaly detection is hard to work

## Defense Strategies: Data Cleaning and Model Security Detection

### Data Cleaning and Validation
- Detect and remove abnormal samples; verify text-image alignment quality
- Scan for potential trigger word patterns

### Model Audit and Testing
- Test generation using known trigger words
- Analyze model response patterns; compare behaviors of different models

### Training Process Monitoring
- Track loss changes; monitor quality distribution of generated samples
- Implement early stopping mechanism to prevent overfitting to backdoors

## Open-source Resources and Academic Value: Promoting Security Research

### Open-source Resources
- **Pre-trained models**: Weights for three attack types (available on HuggingFace Hub)
- **Datasets**: LAION-Aesthetics subset, Dog-Cat-Data_2k, COCO2014train_10k
- **Code**: Complete training/evaluation/attack code open-sourced

### Academic Value
- First systematic study on backdoor attacks against T2I diffusion models, filling the gap
- Proposes three attack types, demonstrating diversity
- Open-source implementation promotes follow-up research
- Reveals security vulnerabilities of multimodal models

## Summary and Future: Towards More Secure T2I Models

### Summary
The BadT2I study proves the feasibility and effectiveness of backdoor attacks on T2I models, issuing a warning for practical deployment and emphasizing the importance of data security and model auditing.

### Future Research Directions
- **More Concealed Attacks**: Semantic triggers instead of lexical triggers
- **Automated Detection**: Machine learning methods to identify backdoor behaviors
- **Robustness Training**: Adversarial training to improve model attack resistance
- **Multimodal Defense**: Defense mechanisms targeting text-image joint features

This study is an important step towards safer AI systems, driving the community to pay attention to T2I model security.