Section 01
[Introduction] When Humans Can See It, But AI Can't: Core Findings of LLM Visual Adversarial Attack Research
Paper Title: When Humans Can See It, But AI Can't: Research on Visual Adversarial Attacks Against Large Language Models Original Author Team: arXiv Paper Author Team Source Platform: arXiv Publication Date: June 8, 2026 Original Link: http://arxiv.org/abs/2606.09700v1
Core Findings: Through strategic typographic visual manipulation (such as character spacing adjustment, visual emphasis, etc.), harmful content can be clearly visible to human readers but successfully evade detection by LLM content moderation systems. Experiments show that the attack success rate exceeds 86% while the machine detection rate is below 1%, revealing a fundamental blind spot in the current LLM moderation ecosystem.