Section 01
CAGE Framework: Using Attribution Graphs to Explain the Reasoning Process of Large Language Models
This article introduces the CAGE (Context Attribution via Graph Explanations) framework, a new method for explaining the reasoning process of large language models by constructing attribution graphs. Compared to traditional methods, CAGE improves fidelity by up to 40%, effectively addressing the flaw of existing context attribution methods that ignore the mutual influence between generated tokens, and providing a new path for LLM interpretability research.