Zing Forum

Reading

FHI: A New Framework for Hallucination Detection in Large Language Models Based on Causal Attribution Alignment

This article introduces a new composite metric called the Faithfulness-Hallucination Index (FHI), which detects hallucinations in large language models by analyzing the alignment between model explanations and internal attribution signals. The framework evaluates the credibility of model outputs from four complementary dimensions, providing an interpretable new approach for LLM hallucination detection.

LLM幻觉检测可解释AI因果归因FHI指标大语言模型安全XAIHaluEval
Published 2026-04-05 16:45Recent activity 2026-04-05 16:52Estimated read 5 min
FHI: A New Framework for Hallucination Detection in Large Language Models Based on Causal Attribution Alignment
1

Section 01

[Introduction] FHI: A New Framework for LLM Hallucination Detection Based on Causal Attribution Alignment

This article introduces the Faithfulness-Hallucination Index (FHI), a new composite metric that detects hallucinations in large language models by analyzing the alignment between model explanations and internal attribution signals. The framework evaluates the credibility of outputs from four complementary dimensions, providing an interpretable new approach for LLM hallucination detection.

2

Section 02

Background: The Fundamental Challenges of LLM Hallucination Problems

Hallucinations in large language models are a core bottleneck restricting reliable applications, and traditional methods struggle to effectively identify errors during the output phase. The root cause lies in a causal disconnect between the model's 'explanations' and its actual 'reasoning process'. The team from Delhi Technological University (India) proposed the FHI framework, which detects hallucinations by aligning model explanations with internal attribution signals (such as attention mechanisms and gradient signals).

3

Section 03

Methodology: FHI's Four-Dimensional Evaluation System and Calculation Rules

FHI consists of four dimensions:

  1. Attribution Alignment Score (AAS): Measures the structural overlap between explanations and internal attribution signals;
  2. Causal Impact Score (CIS): Evaluates causal support by perturbing explanation tokens (weight: 0.35);
  3. Explanation Stability Score (ESS): Assesses the consistency of explanations across multiple generations;
  4. Hallucination Confidence Gap (HCG): Captures the misalignment between confidence and factual correctness (weight: 0.15). Calculation formula: FHI = clip(w1·AAS + w2·CIS + w3·ESS - w4·HCG, 0, 1), with default weights: AAS(0.30), CIS(0.35), ESS(0.20), HCG(0.15). Threshold: FHI < 0.5 indicates the presence of hallucinations.
4

Section 04

Evidence: Technical Implementation and Experimental Validation of FHI

The technical implementation covers a complete XAI toolchain, supporting attention analysis, gradient attribution, SHAP value calculation, etc.; perturbation experiments implement token-level masking and output comparison; the evaluation system covers factual question answering, multi-hop reasoning, and adversarial examples. Experiments were validated on datasets such as TriviaQA, HaluEval, and MuSiQue, especially showing sensitive capture ability for hallucinations in complex reasoning chains on MuSiQue.

5

Section 05

Conclusion and Outlook: Practical Significance and Future Directions of FHI

FHI provides a new perspective for LLM interpretability and safety; the pre-output detection mechanism can identify risks before output, making it suitable for high-risk scenarios such as healthcare and law. The framework's modular design facilitates expansion; future adaptations can include multimodal models and Agent systems. Hallucination detection is key to trustworthy AI; FHI combines interpretability technology with causal reasoning, providing a guarantee for the safe deployment of technology.