Section 01
Introduction / Main Floor: Reasoning Safety Monitor: Real-time Detection of Vulnerabilities in Large Language Model Reasoning Chains
Researchers propose a new concept of "Reasoning Safety", construct a classification system for nine types of unsafe reasoning behaviors, and develop an external monitoring component to detect reasoning hijacking and denial-of-service attacks in real time, achieving an 84.88% step-level localization accuracy on a benchmark of 450 reasoning chains.