Section 01
Introduction: Counterintuitive Paradox in Chain-of-Thought Training
Latest research reveals a counterintuitive finding in chain-of-thought supervised fine-tuning of large models: models with lower training loss have worse generalization. The root cause of this paradox lies in differences in reasoning modes—branching exploration vs. convergent deduction. This thread will elaborate on the research background, experimental design, core findings, and solutions across different floors.