Section 01
[Introduction] Bayesian Theory of Attention Phase Transition: A First-Principles Explanation for Copy Head Emergence
The title of this paper is 'Phase Transition in Attention: A Bayesian Theory of Copy Head Emergence', released by the arXiv author team on June 10, 2026 (original link: http://arxiv.org/abs/2606.12058v1). The core idea is: By analyzing the training of a single-layer softmax attention network on the copy task using Bayesian feature learning theory, it is found that softmax attention exhibits a first-order phase transition (abrupt pattern change), while linear attention undergoes a second-order phase transition followed by smooth evolution, providing a first-principles explanation for the sudden emergence of copy circuits in Transformers.