Large Language Models (LLMs) have demonstrated remarkable capabilities in various natural language processing tasks, but they exhibit puzzling fragility in basic arithmetic operations. This contradictory phenomenon suggests a gap between the model's internal computation mechanism and its discrete output. Why can a model that generates fluent prose and writes complex code frequently make mistakes in simple addition?
Traditional research often treats LLMs as black boxes, inferring their internal mechanisms through input-output behavior analysis. However, this approach is difficult to reveal the true internal representations of models when processing arithmetic operations. The RL-MIND team's research adopts a different approach: by analyzing the geometric structure of the residual stream when the model performs multi-operand addition, they attempt to understand the arithmetic ability of LLMs from an internal perspective.