Section 01
[Introduction] Core Overview of LLM Paraphrase Consistency Research
This study focuses on evaluating the paraphrase consistency of large language models (LLMs) in multiple-choice commonsense question answering tasks. It systematically analyzes the models' answer consistency when facing changes in question expression by filtering semantically equivalent paraphrased versions of questions using natural language inference (NLI). The research aims to reveal the current state of model robustness, providing empirical evidence and methodological support for improving the reliability of AI systems, guiding practical applications (such as education, healthcare, etc.), and promoting AI safety alignment.