Section 01
Introduction: Mechanistic Validity—Establishing a Scientific Validation Framework for Neural Network Mechanistic Interpretability
This article introduces the Mechanistic Validity framework, a methodological system integrating philosophy of science, neuroscience, pharmacology, and measurement theory. It aims to address the core problem in Mechanistic Interpretability (MI) research: "how to verify that discoveries correspond to real mechanisms". The framework includes five-dimensional validation lenses, six-tier validation levels, a claim taxonomy, and an open-source ecosystem, providing a rigorous evaluation benchmark for MI research. It推动 the field from the "discovery" phase to the "validation" phase, which is of great significance for AI safety.