Section 01
[Introduction] XiangQi-LLM-Arena: Evaluating LLM Long-Range Reasoning Capabilities Using Chinese Chess
Introducing XiangQi-LLM-Arena—an open-source scientific benchmark environment designed to quantitatively evaluate the long-range logical reasoning capabilities of large language models (LLMs) through Chinese Chess games. This project addresses issues like data contamination and subjective standards in traditional evaluation benchmarks, providing an objective and contamination-resistant evaluation platform for LLM reasoning capabilities.