Section 01
[Introduction] Exploration and Limitations of Multimodal Models in Identity Document PAD
This article explores the application of large multimodal models such as Paligemma, Llava, and Qwen in identity document presentation attack detection (PAD), finding that these general-purpose models perform poorly in this security task, analyzing the reasons, and pointing out future improvement directions.