Section 01
SYNTEXIS Benchmark: An Evaluation Framework for Automatic Formalization Capability of Mathematical Reasoning in Large Models
SYNTEXIS is a benchmark framework focused on evaluating the automatic formalization capability of mathematical reasoning in large language models (LLMs). It uses the chain-of-thought method to verify models' ability to convert natural language mathematical problems into formalized code executable in theorem provers (e.g., Lean, Coq). It emphasizes end-to-end evaluation (verifying code executability) and multi-dimensional metrics, providing a standardized measurement tool for this field.