Section 01
TopBench: A New Benchmark for Evaluating Large Models' Implicit Prediction and Reasoning Capabilities on Tables
TopBench is a new benchmark for implicit prediction and reasoning tasks in table question answering, consisting of 779 samples covering four task types: single-point prediction, decision-making, treatment effect analysis, and complex filtering. It aims to systematically evaluate the performance of large models on such complex tasks, reveal the limitations of current models, and provide a standardized evaluation platform for related research and applications.