Section 01
[Introduction] Key Findings of the Robustness Study on LLM Automated Scoring Systems
This article conducts an empirical analysis on the robustness of LLM automated scoring systems, exploring their performance when faced with construct-irrelevant factors such as meaningless text padding, spelling errors, changes in writing complexity, and off-topic responses. The study found: Unlike traditional systems, LLM systems have a unique penalty mechanism for text repetition; they are highly sensitive to off-topic content; and they show significant robustness to spelling errors, adjustments in writing complexity, and the addition of some meaningless text (e.g., ability prompt sentences, scenario restatements, formulaic clichés).