Section 01
【Introduction】Large-scale Scientist Assessment: Three Core Limitations of Modern AI in Scientific Innovation
A large-scale assessment that invited authors of 121,640 preprints and involved 6749 scientists found three core limitations of current AI in scientific hypothesis generation: non-reasoning models fall into "groupthink", all models fail to spontaneously propose null hypotheses, and automatic evaluation has weak consistency with human experts' judgments. The study also proposed a reward model based on human feedback, which can improve accuracy by 27%— approaching the consistency level of peer review.