Section 01
[Introduction] ACL 2026 Paper: Rethinking Jailbreak Detection for Large Vision-Language Models with the RCS Method
This article introduces the Representational Contrastive Scoring (RCS) method proposed in an open-source ACL 2026 paper, addressing the problem of jailbreak attack detection for Large Vision-Language Models (LVLMs). It identifies malicious prompts by comparing the differences in model representations between normal inputs and jailbreak inputs. The open-source codebase of this method supports mainstream models such as LLaVA and Qwen-VL, aiming to improve detection accuracy and robustness, and promote multimodal AI security research.