Section 01
MA-ProofBench Benchmark Released: Revealing Shortcomings of Large Models in Formal Reasoning for Advanced Mathematics
The first formal theorem proving benchmark specifically for mathematical analysis, MA-ProofBench, has been officially released. It covers 200 theorems in measure theory, complex analysis, functional analysis, and other areas (100 undergraduate-level questions and 100 PhD qualifying exam-level questions). Tests show that even the top-performing GPT-5.5 only has a 5% pass rate on PhD-level questions, exposing the significant limitations of current large models in formal reasoning for advanced mathematics. This benchmark fills the gap in existing evaluations in the field of mathematical analysis and provides an important reference for assessing AI's mathematical capabilities.