Section 01
Unveiling the Hidden Dimensions of LLM Reflection Capabilities: Enabling Controllable Self-Correction via Activation Intervention
Recent research has for the first time revealed the internal mechanism of reflection capabilities in large language models (LLMs) through activation intervention technology. It found that reflective behaviors can be divided into three levels: no reflection (directly giving answers without intermediate reasoning), internal reflection (spontaneous self-correction during generation), and triggered reflection (executing reflection under instruction). This study was conducted by a joint team from National Taiwan University and Academia Sinica, providing a new perspective for understanding the self-correction capabilities of LLMs while bringing opportunities and challenges in the fields of model optimization and safety.