Section 01
[Introduction] SciR: A Multi-Document Benchmark for Evaluating LLM Scientific Reasoning Capabilities
SciR is a benchmark framework developed by the Idiap Research Institute in Switzerland to evaluate the scientific reasoning capabilities of large language models (LLMs). It covers three core reasoning forms: deduction, induction, and causal abduction, supports parameterized control over reasoning complexity and premise confusion, and includes multi-document settings. It aims to systematically assess LLMs' performance on rigorous scientific reasoning tasks and fill the current gap in evaluation.
Original Author/Maintainer: idiap (Idiap Research Institute, Switzerland) Source Platform: GitHub Release Date: 2026-06-12 Original Link: https://github.com/idiap/SciR