Section 01
Introduction: Core Insights from the Systematic Study on Quality Issues of Code Large Language Models
The Software Engineering Laboratory of Sun Yat-sen University reviewed 114 papers, established a causal mapping framework between training data quality and generated code quality, and revealed how data defects propagate into code defects. The study proposes nine code quality dimensions, a classification system for training data quality issues, and 18 propagation mapping mechanisms, providing a systematic framework for improving the quality of code large language models.