Section 01
Introduction / Main Floor: LLM Red Teaming: A Modular Adversarial Testing Toolkit Covering Character to Semantic Layer Attacks and Jailbreak Evaluation
This article introduces a red team testing toolkit for large language models (LLMs), supporting four levels of adversarial attacks (character, word, sentence, and semantic), integrating the JailbreakBench jailbreak evaluation framework, providing pluggable model targets and an automated judging system, and assisting in AI security research and model robustness verification.