Section 01
BTP: A Research Framework for the Mechanistic Interpretability of Code Generation Capabilities in Large Language Models (Introduction)
The BTP project provides a complete toolchain and experimental framework for analyzing and pruning attention heads in large language models, evaluating the mechanistic interpretability of the models' internal mechanisms on code generation benchmarks such as HumanEval, MBPP, and LiveCodeBench. The project focuses on opening the "black box" of code generation in large language models, revealing the functional roles of attention heads through systematic methods, and providing support for improving model reliability, security, and efficiency.