Section 01
Ensemble Enhancement of Weak Reasoning Models: Core Findings and Introduction
This article explores the core question: Can multiple weak reasoning models match the performance of a strong model through ensemble? The study uses a validator-supported committee search mechanism; 8 proposals from GPT-5.4 nano, after orchestration by a critique-comparator, achieved a 76.4% resolution rate on SWE-bench, matching the standalone performance of top-tier models. Key insight: Ensemble effectiveness does not depend solely on the number of agents, but rather on effectively identifying the correct solutions among the proposals from weak models.