Section 01
[Introduction] ReflexBench: The First Benchmark for Reflective Reasoning of Large Language Models
ReflexBench v1.0 is the first benchmark framework specifically designed to evaluate the reflective reasoning capabilities of large language models (LLMs), filling the gap in the self-awareness and meta-reasoning dimensions of the LLM evaluation system. This article will provide a detailed introduction covering its background, design philosophy, technical methods, application value, and comparison with existing benchmarks.