Section 01
[Introduction] Monitoring Inner Monologue: Probing Trajectories Reveal the Dynamic Behavior of Reasoning Models
This article introduces a study published in May 2026. Addressing the unreliability of Chain-of-Thought (CoT) in Large Reasoning Models (LRMs), it proposes the probing trajectory method: by monitoring the model's internal hidden representations and evaluating detectors at each generated token position to construct trajectories, it finds that complete trajectories are easier to distinguish future behaviors than single static predictions. The max-pooling operation can achieve an AUROC of 95%, providing a new perspective for LRM safety monitoring.