Reading

Deepfake Image Detection System Based on Wavelet Transform and Deep Learning: A Study on Cross-Generator Generalization Ability

This article introduces a hybrid architecture combining an RGB convolutional neural network and a wavelet transform branch for detecting AI-generated images and Deepfake content. Trained only on Stable Diffusion, the system achieves an accuracy of 95.4% on unseen GAN-generated images, demonstrating excellent cross-generator generalization ability.

Deepfake检测小波变换卷积神经网络AI生成图像识别图像取证Stable DiffusionGAN检测

Published 2026-06-03 23:45Recent activity 2026-06-03 23:51Estimated read 8 min

Deepfake Image Detection System Based on Wavelet Transform and Deep Learning: A Study on Cross-Generator Generalization Ability

Section 01

Introduction: Deepfake Detection System Based on Wavelet Transform and Deep Learning (Study on Cross-Generator Generalization Ability)

The DeepTrace project introduced in this article proposes a hybrid architecture combining an RGB convolutional neural network and a wavelet transform branch to detect AI-generated images and Deepfake content. Trained only on Stable Diffusion, the system achieves an accuracy of 95.4% on unseen GAN-generated images, demonstrating excellent cross-generator generalization ability. The project is open-source and provides a desktop tool, addressing the limitations of traditional image forensics methods.

Section 02

Research Background and Motivation: Limitations of Existing Methods and Insights from Frequency Domain Analysis

Limitations of Existing Methods

Traditional Deepfake detection faces two major challenges: 1. Overfitting to specific generators, leading to a sharp drop in accuracy when switching architectures; 2. Black-box nature, making the decision process difficult to interpret.

Frequency Domain Analysis Experiments

Test the classification effect of different feature combinations on GAN images:

Feature Combination	Test AUC	Test F1 Score
HSV color space only	0.580	0.588
FFT frequency domain features only	0.637	0.607
Wavelet transform features only	0.658	0.627
FFT + Wavelet transform	0.719	0.673
HSV + FFT + Wavelet transform	0.726	0.673

Key insight: Frequency domain information is the core signal for distinguishing generated images from real photos; AI-generated artifacts have unique patterns in the frequency domain.

Section 03

Method Architecture: WaveletHybridNet Dual-Branch Design and Training Configuration

Dual-Branch Architecture

RGB Branch: Extracts visual features such as color and texture, capturing semantic information in pixel space;
Wavelet Branch: Uses Daubechies db4 wavelet with two-level decomposition to extract high-frequency subbands (LH, HL, HH) and capture AI-generated artifacts.

Training Configuration

Parameter	Value
Optimizer	AdamW
Learning rate	1×10⁻⁴
Weight decay	1×10⁻⁴
Batch size	8
Input size	128×128
Wavelet type	db4 (2-level decomposition)
Validation strategy	2-fold cross-validation
Early stopping patience	3 epochs

Section 04

Core Experimental Results: Verification of Cross-Generator Generalization Ability

Experimental Design

Trained only on Stable Diffusion v1.5 images, tested on unseen GAN images.

Test Results

Metric	Value
Overall accuracy	95.4%
Real image recognition rate	94% (2360/2507)
Fake image recognition rate	97% (2411/2493)
False positives	147 images
False negatives	82 images

Result Interpretation

The model can detect GAN-generated images with high accuracy without having seen them, proving: 1. The frequency domain artifacts captured by the wavelet branch are transferable across generators; 2. Different generators leave similar fingerprints in the frequency domain; 3. The hybrid architecture separates general frequency domain traces from generator-specific features.

Section 05

Application Deployment: Implementation and Usage of DeepTrace Desktop Tool

System Architecture

Browser/Desktop App → Spring Boot Backend → Python Subprocess Inference → Return Results

Technology Stack

Layer	Technology
Machine Learning	PyTorch, PyWavelets (db4)
Traditional Baseline	scikit-learn SVM
Web Backend	Spring Boot 3.2, Java17
Desktop App	Java Swing, System Tray API

Usage Steps

Run DeepTrace.exe, the D icon appears in the system tray;
Double-click to take a screenshot, drag the selection box to enclose the face;
Release the mouse to get REAL/FAKE result and confidence level;
Press ESC or right-click to cancel.

Section 06

Limitations and Future Directions: Current Restrictions and Expansion Plans

Current Limitations

Portrait/Avatar Optimization: Low reliability in detecting scenes and non-facial content;
Single Training Data Source: Trained only on Stable Diffusion v1.5.

Future Plans

Expand the training dataset to include generators such as DALL·E, Wukong, and additional GAN families to improve cross-generator accuracy and robustness.

Section 07

Conclusion: Project Significance and Open-Source Contribution

The DeepTrace project is an important advancement in the field of Deepfake detection. The hybrid architecture achieves high accuracy and cross-generator generalization ability, which has practical significance for building detection systems adapted to the generative AI ecosystem. The open-source implementation and desktop tool provide accessible tools for researchers and users, enhancing society's ability to identify AI-generated content.

Resource Links:

GitHub Repository: https://github.com/firiusz123/Deepfake-detection
Pre-trained Model: Download best_fold_0.pt from the Releases page