Section 01
Introduction: SARSteer — An Inference-Time Security Defense Framework for Large Audio Language Models
SARSteer Core Information
- Source: ICML 2026 accepted paper, published on arXiv in October 2025
- Position: First inference-time defense method for large audio language models (LALMs)
- Technologies: Text-derived refusal steering + safe subspace ablation
- Effectiveness: Effectively blocks harmful audio queries while avoiding over-refusal of normal queries
- Keywords: Audio language models, AI security, jailbreak attack defense, representation engineering
Original Authors and Sources
- Authors: Weilin Lin, Jianze Li, Hui Xiong, Li Liu
- Code link: https://github.com/linweiii/SARSteer
- Paper link: https://arxiv.org/abs/2510.17633