章节 01
SafeProbe: An Open-Source Toolkit for LLM Security Alignment Evaluation
SafeProbe is an open-source Python toolkit focused on evaluating large language models' (LLMs) security alignment capabilities during the inference phase. It supports multiple attack vectors (jailbreak, prompt injection, adversarial prompt refinement) and a Chain-of-Thought (CoT)-based automated judging system. Designed to balance research reproducibility and practical deployment usability, it helps developers, researchers, and security engineers integrate security assessments into CI/CD pipelines and pre-deployment checks. It supports mainstream LLM providers (OpenAI, Anthropic, HuggingFace, etc.) and open-source models like Llama-3, Mistral, Qwen3.