Finding 1: Simple Problems Are Easier to Verify
Simple problems have fewer reasoning steps and lower cognitive load, so verifiers have a lower probability of judgment errors. Dynamic verification strategies can be adjusted (lightweight processes for simple problems, strict mechanisms for complex problems).
Finding 2: Errors from Weak Generators Are Easier to Detect
Errors from weak generators are more obvious (logical breaks, irrelevant content), while errors from strong generators are hidden (minor deviations in key steps). Experiments show that the performance gap between Gemma2-9B and 27B narrows by 75.7% after verification, and weak generators paired with verifiers can achieve cost-effective results.
Finding 3: Verification Ability Is Correlated with Problem-Solving Ability but Non-Linear
Verification ability is usually positively correlated with the model's own problem-solving ability, but it changes with problem difficulty; the advantage of strong verifiers does not hold in all cases, and simply scaling up the model has bottlenecks.