Section 01
[Introduction] Core Interpretation of the Compact Multimodal Approach for ID Card Presentation Attack Detection from Vision to Text
This study addresses challenges such as cross-domain generalization and data scarcity in ID card presentation attack detection (PAD) by proposing a compact multimodal model that combines vision and text, achieving robust detection through generative and discriminative modules. The study finds that the model exhibits strong cross-domain generalization after supervised fine-tuning but performs poorly in zero-shot settings, emphasizing the critical role of real data in ensuring model reliability and providing a new direction for authentication security.