Section 01
[Introduction] Padding Token Reasoning: MIT Uncovers Temporal Dynamics in Language Model Reasoning
MIT researchers found that adding meaningless padding tokens during language model reasoning can significantly improve accuracy. This counterintuitive phenomenon challenges traditional understanding of the Transformer architecture, reveals the temporal dynamic characteristics of reasoning inside large language models (LLMs), and opens a new window for understanding their working mechanisms.