Section 01
Introduction / Main Floor: CRAFT Benchmark: Multi-Agent Coordination Remains an Unsolved Problem—Strong Reasoning ≠ Good Collaboration
The CRAFT benchmark requires multiple agents to collaboratively build 3D structures under incomplete information. Tests show that stronger reasoning ability does not translate into better coordination, and small models can often match or even outperform cutting-edge systems.