Section 01
CausalT5k Benchmark: A Diagnostic Tool for Causal Reasoning Capabilities of Large Language Models
CausalT5k is a diagnostic benchmark specifically for evaluating the causal reasoning capabilities of large language models, containing 5000 carefully designed questions. Its design follows principles such as comprehensive coverage of causal reasoning types, difficulty stratification, and domain diversity, aiming to help researchers identify the strengths and weaknesses of models in understanding causal relationships. Currently, the project is in its initial stage and is of great significance for model development (diagnosing weaknesses, guiding training) and research standardization.