Section 01
[Introduction] LLM System Instruction Security Vulnerability: Encoding Attacks Can Bypass Protections to Steal Sensitive Information
Researchers have found that attackers can bypass LLM's rejection mechanisms and steal sensitive content from system instructions by packaging information extraction requests as encoding or structured output tasks. The study proposes an automated evaluation framework and a chain-of-thought-based mitigation strategy, providing a new direction for LLM security protection.