Section 01
SafeProbe: Open-Source Security Alignment Evaluation Toolkit for LLMs
SafeProbe is an open-source Python toolkit focused on evaluating the security alignment capabilities of large language models (LLMs) during the inference phase. It supports automated red team attacks, multi-dimensional robustness metrics, and chain-of-thought-based semantic security judgment. Its design targets both academic research (with reproducibility) and engineering integration (CI/CD pipelines), addressing the gap in deep security evaluation beyond surface-level keyword filtering.