Zing Forum

Reading

ProjLens Reveals Backdoor Attack Mechanisms in Projection Layers of Multimodal Large Models

Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, but their deployment faces severe threats from security vulnerabilities. ProjLens is an interpretability framework designed to reveal backdoor attack mechanisms in MLLMs. The study found that even normal downstream task alignment involving only fine-tuning of projection layers can introduce backdoor injection vulnerabilities, and its activation mechanism differs from that observed in pure-text LLMs.

多模态大语言模型后门攻击模型安全可解释性投影层低秩子空间语义偏移MLLM安全
Published 2026-04-21 12:52Recent activity 2026-04-22 12:10Estimated read 6 min
ProjLens Reveals Backdoor Attack Mechanisms in Projection Layers of Multimodal Large Models
1

Section 01

[Introduction] ProjLens Reveals Core Mechanisms of Backdoor Attacks in Projection Layers of Multimodal Large Models

ProjLens is an interpretability framework for Multimodal Large Language Models (MLLMs), designed to reveal backdoor attack mechanisms in their projection layers. Key research findings include: even normal downstream task fine-tuning of only the projection layer can introduce backdoor injection vulnerabilities; backdoor parameters are encoded in the low-rank subspace of the projection layer, with no dedicated trigger neurons; the activation mechanism relies on a linear relationship between the magnitude of semantic shift and input norm—poisoned samples trigger the backdoor due to their large norm. These findings provide critical basis for MLLM security defense.

2

Section 02

Research Background and Motivation

Backdoor attacks implant trigger patterns in training data, causing the model to produce malicious outputs when encountering the trigger, which is difficult to detect in conventional tests. Backdoor mechanisms in pure-text LLMs have been studied, but due to the presence of visual-language projection layers in MLLMs, the manifestation of backdoors may differ. The role of projection layers in backdoor attacks is the core issue of ProjLens research.

3

Section 03

Overview of the ProjLens Framework

Through systematic experiments and analysis, the ProjLens framework reveals for the first time the key role of projection layers in backdoor attacks on MLLMs. Key finding: Normal downstream task alignment (fine-tuning only the projection layer) also creates conditions for backdoor injection, indicating that seemingly benign fine-tuning scenarios also pose security risks.

4

Section 04

Key Finding: Low-Rank Subspace Structure of Backdoor Parameters

Unlike text LLMs which have dedicated trigger neurons, the backdoor weight updates in MLLMs are generally full-rank, but key parameters are encoded in the low-rank subspace of the projection layer. This distributed embedding method makes backdoors more stealthy, making traditional neuron activation-based detection methods ineffective.

5

Section 05

Key Finding: Semantic Shift Activation Mechanism

Embedding vectors of both clean and poisoned samples undergo semantic shift toward the backdoor target direction, but the magnitude of the shift is linearly related to the input norm. Poisoned samples have a larger input norm due to the presence of the trigger, so the shift magnitude is sufficient to activate the backdoor; clean samples have a small norm, so the shift magnitude is insufficient to trigger the backdoor.

6

Section 06

Experimental Validation and Attack Variants

The research team designed four different backdoor attack variants (covering different trigger patterns and targets) for experiments. The results show that the low-rank structure and activation mechanism hold true across all variants, indicating that these mechanisms are inherent properties of the projection layer architecture in MLLMs.

7

Section 07

Security Implications and Defense Ideas

  1. Fine-tuning only the projection layer may also introduce security risks; vigilance is required for all fine-tuning operations. 2. Potential backdoors can be detected by monitoring the low-rank components of projection layer parameters. 3. Defense needs to focus on the geometric properties of the embedding space, rather than just looking for obvious trigger patterns.
8

Section 08

Conclusion

ProjLens deeply reveals for the first time the key role of projection layers in backdoor attacks on MLLMs, enhances understanding of MLLM security vulnerabilities, and lays a theoretical foundation for developing effective defense mechanisms. As multimodal AI becomes more popular, the importance of such basic security research will become increasingly prominent.