Section 01
[Introduction] Research on Representation-Guided Driving Mechanism: OV Circuit Dominance and Sparsification Potential
This study addresses the black-box problem of the internal mechanism of representation-guided technology, using rejection behavior as a case study. Through a multi-token activation patching framework, it reveals that representation-guided vectors primarily interact with the attention mechanism via the OV circuit, enabling 90-99% sparsification while maintaining performance. The research provides mechanistic understanding and practical guidance for model alignment.