Zing Forum

Reading

Over-Explaining? A Study on the Impact of Large Model Reasoning Traces on User Performance and Metacognition

A pre-registered experiment with 559 participants found that full reasoning traces reduce user performance and lead to overconfidence, while concise summaries maintain performance and improve trust, suggesting that reasoning traces should be treated as interface elements rather than cognitive windows.

AI透明性可解释AI推理痕迹认知偏差过度自信人机交互Chain-of-Thought元认知
Published 2026-05-25 21:46Recent activity 2026-05-26 12:53Estimated read 5 min
Over-Explaining? A Study on the Impact of Large Model Reasoning Traces on User Performance and Metacognition
1

Section 01

【Introduction】Core Summary of the Study on Large Model Reasoning Traces' Impact on User Performance and Metacognition

A pre-registered experiment with 559 participants found: Full reasoning traces reduce user performance and lead to overconfidence, while concise summaries maintain performance and improve trust, suggesting that reasoning traces should be treated as interface elements rather than cognitive windows. This study challenges the intuition that 'more explanations = better understanding' and provides key insights for AI transparency design.

2

Section 02

Background: The 'Chatty' Trend of AI Assistants and Questions About Transparency

Current AI assistants (e.g., Claude, ChatGPT) often include long reasoning processes, with the underlying idea of helping users understand and build trust through transparency. But does this design really benefit users? Do excessive explanations instead have negative effects? These are the core questions this study aims to answer.

3

Section 03

Research Method: Pre-registered Experiment Design with 559 Participants

The experiment used a randomized controlled design. Participants completed 10 LSAT logic questions under three conditions:

  1. Answer-only group: No reasoning process
  2. Full trace group: Detailed reasoning shown before the answer
  3. Summary trace group: Answer + concise reasoning summary Measurement indicators included task performance, subjective trust, satisfaction, and metacognitive calibration.
4

Section 04

Key Findings: Full Traces Harm Performance, Summary Traces Are the 'Sweet Spot', Overconfidence Is Prevalent

  1. The full trace group performed significantly worse than the answer-only group; possible reasons: cognitive overload, passive acceptance, anchoring effect
  2. The summary trace group's performance was comparable to the answer-only group, but with higher trust and satisfaction
  3. Overconfidence existed in all groups, and no reasoning format could calibrate self-assessment
  4. Overconfidence stemmed from interaction satisfaction (processing fluency) rather than trust.
5

Section 05

Theoretical Implications: Reasoning Traces Are Interface Elements, Not Cognitive Windows

The study challenges the assumption that 'reasoning traces are windows to the model's cognitive transparency' and proposes:

  1. Reasoning traces should be treated as interface design elements
  2. Do not expect them to automatically bring educational value
  3. Be alert to overconfidence caused by smooth interactions
  4. Redefine transparency as helping users form their own understanding.
6

Section 06

Practical Recommendations: Optimization Directions for Reasoning Display

  1. Prioritize using concise reasoning summaries
  2. Let users think independently before showing AI answers
  3. Clearly distinguish between the functions of explanation and evidence
  4. Be alert to users' 'explanation illusion' and design mechanisms to test understanding.
7

Section 07

Limitations and Future Research Directions

Limitations: Limitations in task type (LSAT logic questions), user background (general population), and model type (open-source models) Future directions: Explore interactive explanations, personalized traces, and reasoning display strategies optimized for educational scenarios.