Section 01
EvalQReason: Step-Level LLM Reasoning Evaluation via Probability Distribution Analysis (Main Guide)
Core Overview
EvalQReason is a three-stage framework for step-level reasoning evaluation of Large Language Models (LLMs) using probability distribution analysis. It eliminates manual annotation and achieves up to F1=0.98 in correctness prediction for math and medical tasks.
Basic Information
- Author: Shaima Ahmad Freja (University of Stavanger)
- Source: GitHub
- Release Time: 2026 June
- Link: https://github.com/Shaima4127/EvalQReason
- Contact: shaima.a.freja@uis.no