Section 01
Introduction to TÜBİTAK Math Olympiad Benchmark Test: In-Depth Cost and Performance Comparison of 8 Large Models
This test compares the performance of 8 mainstream large language models (LLMs) on 32 multiple-choice questions from the 34th TÜBİTAK High School Math Olympiad 2026. Key findings: Some models have converging performance (5 models scored full marks) but significant cost differences (the cost of the most expensive full-score model is 22 times that of the cheapest). Cost-effectiveness becomes a critical factor for LLM selection. The test was published by BYALPERENK on GitHub on May 25, 2026, aiming to fill the gap where traditional benchmarks only focus on accuracy and ignore cost.