Section 01
Introduction: RuleSHAP—An Explainable AI Tool for Auditing LLM Injection Behaviors
RuleSHAP is a novel explainable AI (XAI) method that combines SHAP values with rule extraction. It can detect and explain intentionally injected misleading behaviors in large language models (LLMs), providing a practical tool for AI security auditing. This project corresponds to a 2026 ACM SIGKDD conference paper. Its core innovation lies in combining SHAP feature attribution with rule extraction to capture feature interaction effects and generate human-understandable rule expressions.