Section 01
[Introduction] Extended Empirical Study on Large Language Models for Multilingual Equivalent Mutant Detection
This study systematically evaluates the ability of various large language models (including GPT-4, DeepSeek-Coder, CodeLlama, Qwen2.5-Coder, etc.) to detect equivalent mutants across multiple programming languages, providing important references for mutation testing automation in the software testing field. The study covers core content such as background, models, methods, findings, applications, and conclusions.