Section 01
R-HORIZON: Introduction to the Benchmark Framework for Evaluating the Breadth and Depth Limits of Large Reasoning Models
Introducing the open-source R-HORIZON project, a benchmark framework specifically designed to evaluate the capability boundaries of large reasoning models (LRMs) in terms of reasoning breadth and depth. It aims to address the problem that existing benchmarks cannot systematically reveal the capability boundaries of models, helping researchers and developers understand the true capability limits of reasoning models.