Section 01
[Introduction] Core Introduction to the LLM Jailbreak Attack Defense Scheme Based on Hidden State Causal Monitoring
This article proposes an innovative defense scheme against Large Language Model (LLM) jailbreak attacks—Hidden State Causal Monitoring. This scheme does not rely on output content analysis; instead, it monitors the model's internal hidden states to identify malicious inputs in advance from a causal relationship perspective, solving the lag problem of traditional post-hoc review and achieving proactive defense.