Zing Forum

Reading

Reasoning models don't just think longer—their internal trajectories are truly different

Recent research finds that when reasoning-trained language models face difficult problems, their internal hidden state trajectories exhibit distinct geometric characteristics compared to instruction-tuned models, with this difference being most pronounced in the code domain.

推理模型思维链隐藏状态轨迹几何代码生成大语言模型机器学习人工智能
Published 2026-05-15 06:37Recent activity 2026-05-18 11:47Estimated read 5 min
Reasoning models don't just think longer—their internal trajectories are truly different
1

Section 01

Reasoning models' internal trajectories are truly different; the difference is most significant in the code domain

Recent research finds that when reasoning-trained language models solve difficult problems, the geometric characteristics of their internal hidden state trajectories have systematic differences from ordinary instruction-tuned models, and this difference is most pronounced in the code domain. This post will detail the background, methods, findings, and significance of this study.

2

Section 02

Research Background and Core Questions

In recent years, reasoning models represented by OpenAI's o-series and DeepSeek-R1 have demonstrated strong complex problem-solving abilities, often generating longer chains of thought. However, just from the length of generation, it's impossible to tell whether the model uses a different internal strategy or merely extends computational steps mechanically. The research team attempted to answer this core question by analyzing hidden state trajectories.

3

Section 03

Research Methods: Trajectory Geometric Analysis and Length Correction

The research team designed an analytical framework to compare the performance of reasoning-trained models and instruction-tuned baseline models in three domains: competitive programming, mathematical reasoning, and Boolean satisfiability problems. The key innovation is the introduction of a "length correction" mechanism to separate geometric patterns related to problem difficulty; they tracked hidden state sequences, constructed high-dimensional trajectories, and analyzed attributes such as curvature and heterogeneity.

4

Section 04

Core Findings: Significant Differences in the Code Domain

In the code domain, when reasoning-trained models face more difficult programming problems, the corrected trajectories are more "direct" (focused and efficient paths), and the local curvature heterogeneity is significantly reduced (more consistent and stable internal representation strategies). The baseline models do not have this optimization pattern, indicating that reasoning training changes the internal mechanism rather than just increasing computational load.

5

Section 05

Performance in Mathematical and Boolean Satisfiability Domains

In mathematical reasoning and SAT problems, similar trends were observed, but the effect strength was weaker than in the code domain. The domain differences may be because programming tasks have more explicit structural features and richer intermediate verification points, while mathematical/logical problems involve more operations on abstract concepts, leading to more complex geometric structures of internal representations.

6

Section 06

Behavioral Annotation and Strategy Shift Verification

Behavioral annotation analysis shows that stronger corrected geometric coupling occurs simultaneously with strategy shifts and uncertainty monitoring. Linear probe tests in the prompt phase did not reproduce the separation phenomenon in the code domain, indicating that the special geometric characteristics of reasoning models are mainly manifested during the generation process.

7

Section 07

Research Significance and Future Directions

This study establishes length correction as a prerequisite for generating trajectory analysis, provides empirical support for the existence of reasoning ability, and the significant effect in the code domain provides clues for targeted model optimization. In the future, we can explore the application of trajectory geometric analysis in model diagnosis, ability prediction, and training optimization to help build more reliable and interpretable AI systems.