Section 01
Open-Source LLM Resilience Evaluation Framework: Focus on Response Stability Under Semantic Perturbations
This article introduces the open-source LLM resilience evaluation framework llm-resilience-eval, which aims to systematically measure the response stability of large language models under semantically preserving perturbations. This framework fills the gap in traditional evaluations that only focus on accuracy while ignoring consistency across input variations, supporting four core perturbation test types and providing a new tool for model reliability assessment.