Section 01
Introduction: A Structured Evaluation Approach for Validating LLM Temporal Reasoning with Temporal Graph Constraints
An MSci thesis project from the University of Edinburgh proposes a four-layer evaluation framework (Prediction, Validation, Scoring, Reporting) that converts the temporal reasoning outputs of large language models into temporal graphs for structured validation, supporting four temporal relationship labels: BEFORE/AFTER/SIMULTANEOUS/UNKNOWN. This method not only focuses on the consistency between predictions and standard answers but also detects internal contradictions in the reasoning process, providing a new paradigm for evaluating the temporal reasoning capabilities of LLMs.