Zing Forum

Reading

HSR: Reconstructing Safety Defenses for Pruned Multimodal Large Models

ACL 2025 accepted research proposes a hierarchical safety realignment method that restores the safety capabilities of pruned vision-language models with almost no additional computational overhead.

模型剪枝安全对齐视觉语言模型ACL 2025模型压缩AI安全
Published 2026-05-21 10:41Recent activity 2026-05-21 10:54Estimated read 5 min
HSR: Reconstructing Safety Defenses for Pruned Multimodal Large Models
1

Section 01

【Main Floor】HSR: Reconstructing Safety Defenses for Pruned Multimodal Large Models

ACL 2025 accepted research proposes the Hierarchical Safety Realignment (HSR) method, which restores the safety capabilities of pruned vision-language models with almost no additional computational overhead. This method addresses the weakening of safety alignment caused by model compression (e.g., pruning) without the need for expensive re-safety fine-tuning.

2

Section 02

Background: Safety Dilemma Brought by Model Compression

Large model compression techniques (e.g., pruning, quantization) are key to deploying multimodal models, but compression often weakens the model's safety alignment capabilities—smaller and faster models tend to generate harmful outputs. Traditional solutions require expensive re-safety fine-tuning, which contradicts the original purpose of compression.

3

Section 03

Core Ideas and Technical Mechanisms of HSR

Core Idea

The core insight of HSR (Hierarchical Safety Realignment): Model pruning mainly affects the distribution at the parameter level, while the hierarchical structure of semantic representations relied on by safety alignment remains intact. Precise intervention at key levels is sufficient to restore safety capabilities.

Technical Mechanisms

  1. Hierarchical Intervention Strategy: Divide the representation space of vision-language models into multiple semantic levels, identify key levels for safety alignment, and apply lightweight realignment constraints;
  2. Adaptive Gating Mechanism: Dynamically adjust the realignment intensity based on input sensitivity—light intervention for regular queries, strong constraints for risky inputs;
  3. Synergy with Pruning Process: Applied independently after pruning, no need for original training data or a full fine-tuning cycle.
4

Section 04

Experimental Validation: Balancing Safety Restoration and Efficiency

In tests on multiple vision-language models, HSR showed significant performance:

  • Safety Restoration Rate: The harmful output rate dropped to near the original unpruned level;
  • Performance Retention: Minimal loss in accuracy for standard vision-language tasks;
  • Computational Overhead: Reduced by several orders of magnitude compared to full safety fine-tuning.
5

Section 05

Practical Significance and Application Prospects

HSR provides a feasible path for deploying safe multimodal models on edge devices (e.g., mobile, embedded systems), allowing developers to enjoy the benefits of compression without sacrificing safety alignment. Additionally, this method inspires the thought that compression and safety are not necessarily contradictory—by understanding the hierarchical representation of models, a balance between efficiency and safety can be found.

6

Section 06

Summary and Outlook

HSR is an important advancement in the intersection of model compression and safety alignment, proving that lightweight intervention can significantly restore safety capabilities and provide security guarantees for model lightweighting technologies. Future work can extend to other compression paradigms (e.g., quantization, distillation) and a wider range of modal combinations.