Section 01
[Introduction] RDT: A New Training-Free Safety Alignment Method for Multimodal Agents
This article introduces a safety alignment method for multimodal agents called Refusal Direction Transfer (RDT). By transferring the refusal direction from safety-aligned LLMs (e.g., Llama-2-7b-chat) to vision-language-action (VLA) models (e.g., OpenVLA), this method achieves safety alignment without retraining, addressing the safety blind spot in the action space of VLA models and providing a new approach for the safe control of robotic agents.