Section 01
FaithfulnessBench: Verifying Chain-of-Thought Faithfulness of Reasoning Models via Causal Intervention (Guide)
Project Basic Information
- Original Author/Maintainer: pratik916
- Source Platform: GitHub
- Project Link: faithfulnessbench
- Release Date: 2026-06-09
Core Guide
FaithfulnessBench is an open-source framework designed to measure the chain-of-thought (CoT) faithfulness of reasoning models using four orthogonal causal probes, solving the circular reasoning problem in traditional single-probe measurements. Its core innovation lies in using configurable synthetic models to verify probe effectiveness, and it ultimately finds that chain-of-thought faithfulness is not a single scalar but a "faithfulness card" containing four sub-scores—a multi-dimensional evaluation is needed to accurately judge model behavior.