Section 01
[Introduction] Rosetta-PL: A New Benchmark for Evaluating Logical Reasoning Capabilities of Large Language Models
Researchers created the Rosetta-PL benchmark by translating logical propositions from the Lean theorem prover into a custom logical language, which is used to systematically evaluate the performance of large language models on formal reasoning tasks. This benchmark reveals the patterns of how models learn logical rules, providing guidance for low-resource language applications and model training optimization.