# Over-Explaining? A Study on the Impact of Large Model Reasoning Traces on User Performance and Metacognition

> A pre-registered experiment with 559 participants found that full reasoning traces reduce user performance and lead to overconfidence, while concise summaries maintain performance and improve trust, suggesting that reasoning traces should be treated as interface elements rather than cognitive windows.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-25T13:46:04.000Z
- 最近活动: 2026-05-26T04:53:31.279Z
- 热度: 135.9
- 关键词: AI透明性, 可解释AI, 推理痕迹, 认知偏差, 过度自信, 人机交互, Chain-of-Thought, 元认知
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2605-25856v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2605-25856v1
- Markdown 来源: floors_fallback

---

## 【Introduction】Core Summary of the Study on Large Model Reasoning Traces' Impact on User Performance and Metacognition

A pre-registered experiment with 559 participants found: Full reasoning traces reduce user performance and lead to overconfidence, while concise summaries maintain performance and improve trust, suggesting that reasoning traces should be treated as interface elements rather than cognitive windows. This study challenges the intuition that 'more explanations = better understanding' and provides key insights for AI transparency design.

## Background: The 'Chatty' Trend of AI Assistants and Questions About Transparency

Current AI assistants (e.g., Claude, ChatGPT) often include long reasoning processes, with the underlying idea of helping users understand and build trust through transparency. But does this design really benefit users? Do excessive explanations instead have negative effects? These are the core questions this study aims to answer.

## Research Method: Pre-registered Experiment Design with 559 Participants

The experiment used a randomized controlled design. Participants completed 10 LSAT logic questions under three conditions:
1. Answer-only group: No reasoning process
2. Full trace group: Detailed reasoning shown before the answer
3. Summary trace group: Answer + concise reasoning summary
Measurement indicators included task performance, subjective trust, satisfaction, and metacognitive calibration.

## Key Findings: Full Traces Harm Performance, Summary Traces Are the 'Sweet Spot', Overconfidence Is Prevalent

1. The full trace group performed significantly worse than the answer-only group; possible reasons: cognitive overload, passive acceptance, anchoring effect
2. The summary trace group's performance was comparable to the answer-only group, but with higher trust and satisfaction
3. Overconfidence existed in all groups, and no reasoning format could calibrate self-assessment
4. Overconfidence stemmed from interaction satisfaction (processing fluency) rather than trust.

## Theoretical Implications: Reasoning Traces Are Interface Elements, Not Cognitive Windows

The study challenges the assumption that 'reasoning traces are windows to the model's cognitive transparency' and proposes:
1. Reasoning traces should be treated as interface design elements
2. Do not expect them to automatically bring educational value
3. Be alert to overconfidence caused by smooth interactions
4. Redefine transparency as helping users form their own understanding.

## Practical Recommendations: Optimization Directions for Reasoning Display

1. Prioritize using concise reasoning summaries
2. Let users think independently before showing AI answers
3. Clearly distinguish between the functions of explanation and evidence
4. Be alert to users' 'explanation illusion' and design mechanisms to test understanding.

## Limitations and Future Research Directions

Limitations: Limitations in task type (LSAT logic questions), user background (general population), and model type (open-source models)
Future directions: Explore interactive explanations, personalized traces, and reasoning display strategies optimized for educational scenarios.
