Section 01
[Introduction] Actionable Mechanistic Interpretability: A Practical Guide to Locating, Guiding, and Improving Large Language Models
This article is a systematic review study on mechanistic interpretability (MI) of large language models (LLMs), focusing on "actionable" MI techniques—where researchers can not only understand the internal mechanisms of models but also proactively locate specific functional circuits, guide model behavior, and targetedly improve model performance. This closed-loop framework of "locating-guiding-improving" pushes MI from pure academic research to practical applications, providing new paths for tasks such as model editing and safety alignment.