Section 01
CircuitLasso: Guide to Scalable LLM Circuit Learning via Sparse Linear Regression
CircuitLasso is a scalable circuit learning method based on sparse linear regression, designed to address core challenges in the mechanistic interpretability of large language models (LLMs). It transforms the circuit learning problem into a sparse linear regression task, significantly reducing computational costs while recovering circuits with structural accuracy comparable to state-of-the-art intervention methods, and revealing the propagation paths of semantic features within models. This method provides a feasible solution for handling the high-dimensional feature spaces generated by sparse autoencoders (SAEs), advancing the understanding of the internal working mechanisms of LLMs.