Section 01
AFI Cognitive Benchmark Test: Revealing the True Boundaries of Large Models' Reasoning (Introduction)
Large language models often achieve high scores in standardized benchmark tests, but their performance is poor in real complex scenarios. The AFI Cognitive Benchmark Test focuses on three core dimensions: reasoning ability, anti-interference ability, and logical consistency. Through over 180 adversarial tasks, it reveals the gap between current large models and human-level reasoning, making up for the deficiency of traditional tests that focus on memory recall.