Section 01
Introduction: Trace Inversion Enables Large Language Models to Proactively Say 'I Don't Know'
Researchers propose the Query Misalignment framework and Trace Inversion method, which detect the phenomenon of 'answering irrelevant questions' by analyzing model reasoning traces. This helps reasoning-focused large language models proactively refuse answering when uncertain, significantly improving their abstention ability across nine QA datasets. This method redefines the essence of hallucinations and provides a new defense line for AI safety.