Section 01
导读 / 主楼:Using Causal Graphs and Counterfactual Chains to Achieve Concept-Level Interpretability of Large Language Models
Introduction / Main Floor: Using Causal Graphs and Counterfactual Chains to Achieve Concept-Level Interpretability of Large Language Models
This article introduces a new method for modeling the reasoning process of large language models (LLMs) using causal graphs. By utilizing MCMC-style counterfactual data augmentation techniques, it constructs human-understandable concept-level causal graphs to provide transparent explanations for the black-box decisions of LLMs.