# When to Trust Tools? An Adaptive Tool Trust Calibration Method for Tool-Integrated Mathematical Reasoning

> This article introduces the ATTC framework, which uses code block confidence scores to guide models to adaptively choose to trust or ignore tool results, effectively solving the "tool neglect" problem in tool-integrated reasoning and improving performance by 4.1% to 7.5%.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-09T14:14:37.000Z
- 最近活动: 2026-04-10T02:46:24.024Z
- 热度: 116.5
- 关键词: 工具集成推理, 大语言模型, 数学推理, 置信度校准, 工具调用, 自适应学习
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2604-08281v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2604-08281v1
- Markdown 来源: floors_fallback

---

## [Main Floor] When to Trust Tools? The ATTC Framework Solves the Tool Neglect Problem in Tool-Integrated Reasoning

This article addresses the "tool neglect" problem in Tool-Integrated Reasoning (TIR), where models often ignore correct tool results, and proposes the Adaptive Tool Trust Calibration (ATTC) framework. This framework uses code block confidence scores to guide models to adaptively choose to trust or ignore tool results, effectively alleviating the tool neglect phenomenon and achieving a performance improvement of 4.1% to 7.5% across multiple models and datasets.

## [Background] The Rise and Hidden Concerns of Tool-Integrated Reasoning: Models Don't Know When to Trust Tools

With the development of Large Reasoning Models (LRMs), Tool-Integrated Reasoning (TIR) has become an important paradigm to break through the limitations of purely parametric reasoning, allowing models to call external tools (such as Python, SQL) to obtain accurate results. However, existing TIR models have the "tool neglect" problem: when their own reasoning conflicts with tool results, models often stick to their own opinions and even actively ignore correct tool outputs. This stems from the fact that training does not explicitly teach models to evaluate and integrate tool results, leading to tool integration becoming a superficial formality.

## [Method] The ATTC Framework: An Adaptive Trust Calibration Mechanism Based on Code Confidence

The core of the ATTC framework is a dynamic decision-making mechanism based on code block confidence:
1. **Confidence Estimation Module**: Calculates the confidence score of each generated code block, reflecting the model's degree of certainty in tool calls;
2. **Dynamic Trust Decision**: Adopts tool results when confidence is high, and relies on internal reasoning when confidence is low;
3. **Calibration Learning Mechanism**: Establishes a mapping between confidence and tool reliability through a dedicated training objective.
In implementation, ATTC modifies the loss function: it penalizes the behavior of ignoring correct tool results, strengthens correct trust decisions, and integrates into the existing TIR training process.

## [Evidence] Experimental Verification: ATTC Significantly Alleviates Tool Neglect, with Performance Improvements of 4.1%-7.5%

Experimental verification shows that ATTC has significant effects:
- **Alleviates Tool Neglect**: The cases where models ignore correct tool results are greatly reduced;
- **Performance Improvement**: Performance increases by 4.1% to 7.5% across different model sizes and datasets;
- **Good Generalization**: Stable improvements across model architectures and datasets.
In the case study, the baseline model called the tool but ignored the result, while after ATTC training, it could correctly trust the tool output and give accurate answers.

## [Conclusion and Recommendations] Technical Insights and Future Directions of ATTC

ATTC brings technical insights:
- **Metacognitive Ability**: Tool integration requires cultivating models' metacognition to evaluate tool reliability;
- **Value of Confidence**: Code confidence can be extended as a decision signal to other scenarios;
- **Adaptive Decision-Making**: Dynamically adjusting behavior is more robust than fixed rules.
Future directions can further explore the multi-dimensional applications of confidence. The conclusion points out that ATTC provides a solution for balancing autonomous reasoning and external assistance, and will lead subsequent research on tool-integrated reasoning.
