Zing Forum

Reading

Reasoning Trace Collapse: How Fine-tuning Quietly Undermines Explicit Reasoning Models

This paper reveals the phenomenon of Reasoning Trace Collapse in explicit reasoning models during downstream fine-tuning—models can still produce correct answers but lose structured intermediate reasoning processes. It proposes a structural evaluation framework and a loss masking strategy to detect and mitigate this issue.

显式推理模型微调链式思考可解释性评估框架AI安全
Published 2026-05-20 20:58Recent activity 2026-05-21 11:56Estimated read 6 min
Reasoning Trace Collapse: How Fine-tuning Quietly Undermines Explicit Reasoning Models
1

Section 01

[Introduction] Reasoning Trace Collapse: A Hidden Crisis in Fine-tuning Explicit Reasoning Models

This paper reveals the Reasoning Trace Collapse phenomenon in explicit reasoning models (e.g., DeepSeek-R1, OpenAI o1) during downstream fine-tuning—models can still maintain correct answers but lose structured intermediate reasoning processes. This phenomenon is highly covert and undermines the model's interpretability and reliability. The study proposes a structural evaluation framework to detect the problem and uses a loss masking strategy to mitigate the collapse, providing key guidance for the fine-tuning and application of explicit reasoning models.

2

Section 02

Background: The Rise of Explicit Reasoning Models and Fine-tuning Challenges

In recent years, explicit reasoning models have excelled in complex tasks by generating detailed intermediate reasoning processes (e.g., chain-of-thought), bringing three major advantages: interpretability, reliability, and the ability to handle complex tasks. However, during downstream fine-tuning, task data often only contains instruction-response pairs and lacks intermediate reasoning traces, which becomes a key challenge for model applications.

3

Section 03

Phenomenon: Definition and Harms of Reasoning Trace Collapse

The study discovered the Reasoning Trace Collapse phenomenon: after fine-tuning an explicit reasoning model on data without reasoning traces, although it can still output correct answers, it loses structurally valid explicit reasoning traces and degenerates from explicit reasoning to implicit reasoning. Its harms include: the correctness of answers masks the problem, loss of interpretability, decreased reliability, and difficulty in locating and correcting errors.

4

Section 04

Method: Structural Evaluation Framework—An Evaluation System Separating Answers and Reasoning

To quantitatively study the collapse phenomenon, the team developed a structural evaluation framework that assesses the state of reasoning traces from four dimensions: valid reasoning (exists and logically coherent), empty reasoning (invalid content), missing reasoning (directly outputting answers), and truncated reasoning (stopping midway). The framework also introduces reasoning-conditional performance, which calculates task performance only when reasoning is valid, revealing the model's true explicit reasoning ability.

5

Section 05

Experimental Evidence: Collapse Speed and Evaluation Bias

Experiments were conducted on four open-source reasoning models, and the findings are: 1. Standard Fine-tuning (SFT) can reduce the proportion of valid reasoning in a very short time; 2. Answer-only metrics seriously mask the problem—conditional performance remains high, but the valid reasoning rate drops sharply, leading researchers to mistakenly judge fine-tuning as successful, while the core ability is actually impaired.

6

Section 06

Mitigation Strategy: Loss Masking—A Protection Method Without Additional Reasoning Traces

A loss masking strategy is proposed to mitigate the collapse: when calculating training loss, process the reasoning trace part (full masking: no loss calculation; partial masking: reduce weight). This method does not require teacher-generated reasoning traces; only modifying the loss calculation can significantly reduce the collapse while maintaining task performance and explicit reasoning ability.

7

Section 07

Practical Recommendations and Research Insights

Practical Recommendations: 1. Evaluations should include reasoning reliability metrics (proportion of valid reasoning, conditional performance, etc.); 2. When fine-tuning on data without reasoning traces, use loss masking and monitor quality; 3. Consider synthetic reasoning traces (generated by teacher models, manual annotation, etc.); 4. Continuously monitor reasoning behavior in production environments.

Research Insights: Performance does not equal ability; a single metric easily masks behavioral changes; fine-tuning needs to be cautious—standard SFT may lead to ability degradation. Protecting explicit reasoning ability is key to building trustworthy AI.