Section 01
[Introduction] Cross-Modal Prompt Injection Attacks: New Security Challenges for LVLMs
Cross-Modal Prompt Injection (CrossMPI) is a new security vulnerability targeting Large Vision-Language Models (LVLMs). Attackers can manipulate model behavior without text input by using image perturbations that are barely perceptible to the human eye. This article deeply analyzes the principles, harms, and defense strategies of this attack, reveals key blind spots in multimodal AI system security, and calls on developers and users to attach importance to cross-modal security protection.