Section 01
[Introduction] FrontierFinance Benchmark: Long-Running Task Evaluation of LLMs in Real Financial Scenarios
Introduction to the FrontierFinance Benchmark
FrontierFinance is a long-running computer usage benchmark for real-world financial tasks, consisting of 25 complex financial modeling tasks, each requiring an average of over 18 hours of professional human effort. Its core purpose is to evaluate the performance of LLMs in real-world financial professional scenarios, bridge the gap between existing benchmarks and actual professional needs, and provide rigorous references for the application of AI in the financial field.