# Quantitative Study on the Faithfulness of Confidence Expression in Large Reasoning Models

> The study finds that large reasoning models face significant challenges in the faithfulness of confidence expression; improved reasoning ability does not automatically translate to better calibration, and different confidence estimators give divergent assessments of the same reasoning process.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-02T17:53:45.000Z
- 最近活动: 2026-06-03T04:56:37.258Z
- 热度: 144.9
- 关键词: 忠实校准, 大型推理模型, 置信度表达, 不确定性量化, AI安全, 思维链
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2606-03969v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2606-03969v1
- Markdown 来源: floors_fallback

---

## [Introduction] Core Insights from the Study on Faithfulness of Confidence Expression in Large Reasoning Models

### Key Takeaways
The study focuses on the **Faithfulness of Confidence Expression (FC)** in Large Reasoning Models (LRMs) and finds:
1. Improved reasoning ability of LRMs does not automatically translate to calibration capability;
2. Different confidence estimators give divergent assessments of the same reasoning process;
3. FC is the cornerstone of AI trustworthiness, especially critical in high-risk scenarios (medical, legal, etc.);
4. Current LRMs have significant challenges in calibration and need independent optimization of FC objectives.

Original source: Published on arXiv on June 2, 2026, titled *Quantifying Faithful Confidence Expression in Large Reasoning Models* (link: http://arxiv.org/abs/2606.03969v1)

## Background: Definition and Importance of Faithful Confidence Expression in LRMs

### Definition
**Faithful Calibration (FC)** refers to the consistency between the model's internal uncertainty and its linguistic expression of confidence—hesitant when uncertain, confident when certain.

### Necessity in High-Risk Scenarios
- **Medical diagnosis**: Overconfidence in wrong diagnoses may mislead doctors/patients;
- **Legal consultation**: Need to accurately distinguish certainty from gray areas;
- **Financial decision-making**: Confidence directly affects asset allocation risk;
- **Educational tutoring**: Help students identify "certain knowledge" vs. "speculation".

### Problem Highlight
LRMs are known for lengthy chains of thought, but reasoning traces may not reflect true confidence levels, possibly using rhetoric to mask uncertainty.

## Four Limitations of Existing Evaluation Methods

Traditional methods face fundamental challenges with LRMs:
1. **No clear step boundaries in chain of thought**: Continuous text is hard to decompose into discrete steps;
2. **Inconsistent step structures**: Large structural differences between mathematical derivation and common sense reasoning make cross-step comparison difficult;
3. **Complex conditional dependencies**: Branches like "if A then B else C" lead to complex confidence propagation/aggregation;
4. **Difficulty estimating internal confidence**: Simple token probabilities cannot reflect the deep uncertainty of LRMs.

## Research Framework: Three-Dimensional Internal Uncertainty Analysis

### Three Dimensions
1. **Token probability dimension**: Judge uncertainty via the dispersion of the probability distribution of key tokens;
2. **Hidden state dimension**: Extract deep confidence signals using neural network activation states;
3. **Sampling response consistency dimension**: The degree of difference in multiple sampled responses reflects uncertainty.

### Prefix Conditional Sampling Strategy
Fix the chain of thought prefix, observe subsequent generation changes, and isolate the impact of specific factors on confidence (e.g., fix the first half of reasoning to evaluate confidence in the second half).

## Key Findings: Reasoning Ability ≠ Calibration Ability

1. **FC is a significant challenge for LRMs**: Excellent reasoning but poor calibration, with misalignment between internal uncertainty and expressed confidence;
2. **Reasoning does not automatically improve calibration**: Longer chains of thought do not enhance calibration ability—models can "reason" but not "evaluate reasoning";
3. **Prompt intervention fails**: Prompt techniques for non-reasoning models (e.g., asking to hesitate) have limited effect on LRMs;
4. **Estimator divergence**: Different methods (token probability/hidden state/sampling) give inconsistent assessments of the same reasoning.

## Failure Modes: Common Calibration Issues in LRMs

1. **Overconfidence**: Using phrases like "obviously" or "without doubt" to mask uncertainty (stemming from training data bias);
2. **False modesty**: Using "maybe" or "perhaps" when certain (due to safety training avoiding absolute assertions);
3. **Decoupling of reasoning length and confidence**: Lengthy reasoning does not necessarily correspond to high confidence;
4. **Failure to propagate confidence in conditional reasoning**: Incorrectly transferring premise uncertainty to conclusions (e.g., expressing wrong confidence when B is derived from A with 70% confidence).

## Implications for AI Safety and Alignment

1. **FC needs independent optimization**: Current alignment focuses on usefulness/harmfulness/honesty—FC should be an explicit goal;
2. **New training methods**: Existing supervised learning/RLHF are insufficient; need to develop loss functions that reward accurate confidence expression;
3. **Innovation in evaluation methods**: Need more reliable/consistent evaluation paradigms (e.g., combining multiple estimation methods);
4. **UI design adjustments**: Prompt users not to rely solely on model confidence and provide additional reliability indicators.

## Summary and Future Directions

### Summary
The first systematic quantification of FC capability in LRMs reveals the separation between reasoning and calibration, warning that applications in high-risk scenarios need caution.

### Limitations
- Evaluation scope is limited to Q&A tasks, not covering creative writing/code generation;
- Internal uncertainty estimation is still imperfect.

### Future Directions
1. Develop calibration-aware training objectives;
2. Real-time calibration feedback mechanisms;
3. Cross-task calibration transfer research;
4. Optimize user interaction design (investigate user understanding of confidence).
