Section 01
[Introduction] Beyond Semantics: New Breakthrough in Cross-Modal Synthetic Image Detection via Physical Features + CLIP
This paper addresses the deepfake detection challenges posed by AIGC, proposing a solution rooted in physical essence: systematically exploring 15 physical features, selecting 5 core features that are stable across datasets, and combining them with CLIP's semantic understanding. It achieves SOTA on the GenImage benchmark, with accuracy up to 99.8% on some datasets, effectively solving the problem of insufficient generalization ability of existing detectors.