Technical Principles and Attack Mechanism
The core mechanism of the ComicJailbreak attack lies in leveraging the progressiveness and context dependency of visual narratives. When MLLMs process comic inputs, they need to integrate information from multiple frames to understand the complete storyline. Attackers can use this feature to combine individual harmless elements into a narrative structure with specific intentions.
It specifically involves the following aspects:
Semantic manipulation of frame sequences: Carefully designing the order of frames to guide the model along a specific reasoning path. A single frame does not trigger an alert, but the serialized combination may lead to harmful outputs.
Coordination between dialogue boxes and visual elements: The text in comic dialogue boxes is short, but combined with the visual context, it carries rich semantics. Attackers can use the interweaving of text and images to disperse harmful intentions across multiple modalities.
Inductive effect of narrative structure: The narrative structure of comics guides the model's expectations; by manipulating the rhythm and plot development, attackers can induce the model to generate responses that violate safety policies.