章节 01
TIMEBench: A Benchmark for Evaluating LLM Time Understanding Capabilities
TIMEBench is an open-source benchmark project initiated by The Coherence Initiative, focusing on assessing large language models' (LLMs) time reasoning abilities. Its core goals include: quantifying LLM time reasoning capabilities, identifying their strengths and limitations in time understanding, tracking progress as models iterate, and providing directional guidance for improving models' time cognitive abilities.