Section 01
[Introduction] Hidden Bottleneck in Arithmetic Reasoning of Small Language Models: Format Compliance Rather Than Reasoning Ability
Latest research reveals that the core reason for the poor performance of small language models (SLMs) in arithmetic reasoning tasks is not the lack of reasoning ability, but the difficulty in meeting strict output format requirements. Traditional evaluation methods may systematically underestimate the true capabilities of small models due to format constraints, and this finding has important implications for model evaluation, application optimization, and training directions.