Section 01
[Introduction] The Step Confusion Trap in Large Model Reasoning Data Selection and Its Correction Methods
Recent research has found that naturalness-based data selection methods have systematic biases when evaluating large model reasoning data—they tend to select samples with longer reasoning steps rather than higher-quality ones. The researchers proposed two correction methods, ASLEC-DROP and ASLEC-CASL, which significantly improve the accuracy of reasoning data screening by eliminating the interference of initial word probabilities. This article will analyze this problem and its solutions in separate floors.