Section 01
Introduction: Thought-Level Causal Intervention—A New Direction in Model Interpretability Research
This article introduces a groundbreaking research method for model interpretability: thought-level causal intervention. This method elevates the analysis of reasoning processes from the traditional token level to the thought level, aiming to address the limitation of token-level methods in capturing human cognitive-level reasoning, and provides a new perspective for understanding the internal mechanisms of large language models. Its core includes the conceptual framework of thought levels and the technical implementation of causal intervention, with advantages such as semantic alignment and precise intervention.