Section 01
[Introduction] Causally Explainable Guardrail: A New Approach to Enhancing LLM Security
This project proposes a causally explainable guardrail mechanism that uses causal reasoning methods to identify and block harmful outputs from large language models (LLMs), while providing an explainable basis for safety decisions. This mechanism aims to address the problems of existing guardrail solutions, such as black-box decision-making, high false positive rates, adversarial vulnerability, and lack of causal understanding, bringing new breakthroughs to the field of LLM security.