Section 01
[Introduction] Analysis of the Research Framework for the 'Correct Answer but Wrong Reason' Phenomenon in Open-Source Reasoning Models
This study constructs a complete framework to detect the 'shortcut-driven reasoning' phenomenon (i.e., correct answer but wrong reason) in open-source weight reasoning models. The framework combines behavioral testing and mechanistic interpretability methods to evaluate whether models obtain correct answers through genuine reasoning or superficial shortcuts. Key finding: Reasoning failures in small models with fewer than 2 billion parameters mainly stem from 'confused reasoning' rather than 'shortcut dependence', providing a systematic tool for understanding and improving the reasoning capabilities of small models.