Section 01
[Introduction] Vulnerability of Instruction-Tuned Models: A Single Punctuation Mark Can Cause Responses to Collapse
This article reveals that instruction-tuned large models have fundamental vulnerabilities: simple lexical constraints (such as banning a single punctuation mark or common word) can lead to a 14-48% loss of response comprehensiveness. This vulnerability originates from the instruction tuning training paradigm itself, not the model size or architecture. Both open-source and closed-source models (e.g., GPT-4o-mini) are affected, indicating the need to pay attention to model robustness.