Section 01
CCR.GB Benchmark: Guide to Evaluating Compositional Causal Reasoning Capabilities of Large Language Models
Title: CCR.GB: Evaluating the Compositional Causal Reasoning Capabilities of Large Language Models This article introduces the CCR.GB benchmark framework, which aims to systematically evaluate the performance of large language models (LLMs) on compositional causal reasoning tasks. Based on Judea Pearl's causal hierarchy (three levels: association, intervention, counterfactual), this benchmark fills the gap where existing benchmarks fail to capture complex causal structures. The project is maintained by kun-zero162, with the source code hosted on a GitHub repository, and the related paper is published at ICML 2025.