Section 01
[Introduction] Activation Boundary Defense: A New Neuron-Level Approach to LLM Jailbreak Protection
This article introduces Activation Boundary Defense (ABD), a new neuron-level technology for defending against LLM jailbreak attacks. Its core is to adaptively constrain activation values in middle and lower network layers using Bayesian optimization, achieving a defense success rate of over 98%. At the same time, its impact on the model's normal performance is less than 2%, and the computational overhead is controllable, providing a new perspective for LLM security protection.