Zing Forum

Reading

Taxonomy of Large Language Model Security Vulnerabilities: A Systematic Review of Four Core LLM Security Risks

A structured research report for students and researchers that categorizes large language model security threats into four major types—jailbreak attacks, prompt injection, data poisoning, and hallucinations—and provides a unified risk analysis framework.

大语言模型安全越狱攻击提示注入数据投毒幻觉AI安全漏洞分类对抗性攻击模型安全纵深防御
Published 2026-06-08 05:13Recent activity 2026-06-08 05:28Estimated read 6 min
Taxonomy of Large Language Model Security Vulnerabilities: A Systematic Review of Four Core LLM Security Risks
1

Section 01

[Introduction] Taxonomy of LLM Security Vulnerabilities: Four Core Risks and a Unified Analysis Framework

This project was completed by ketki1202 and Lalitha Sravanti Dast as the final project for the Fall 2025 LLM course, with the last update on June 7, 2026. It systematically reviews the four core security risks of large language models (LLMs)—jailbreak attacks, prompt injection, data poisoning, and hallucinations—and provides a unified risk analysis framework to help students and researchers understand the full landscape of LLM security threats.

2

Section 02

Project Background and Overview

With the widespread application of LLMs in critical business scenarios, their security issues have received increasing attention. However, existing security research is scattered and lacks systematic integration. This project aims to integrate fragmented research results into a unified, student-friendly framework, providing researchers and developers with a clear threat map. The project is sourced from GitHub with the original title Security-Taxonomy-of-Large-Language-Model-Vulnerabilities.

3

Section 03

Detailed Explanation of Four Core Security Vulnerability Categories

The project categorizes LLM security threats into four types:

  1. Jailbreak Attacks: Bypass security filters during the inference phase to induce harmful content generation. Key features: occur during inference, exploit filter weaknesses, require adversarial prompt design;
  2. Prompt Injection: Confuse trusted instructions with user input, leading to unauthorized operations or information leakage. Its logic is similar to traditional injection attacks;
  3. Data Poisoning: Manipulate training data to implant harmful behaviors. It is persistent and difficult to detect and fix after deployment;
  4. Hallucinations: Models generate false and unfounded outputs, stemming from probabilistic generation characteristics. No adversarial triggering is needed, which undermines reliability.
4

Section 04

Multi-Dimensional Unified Analysis Framework

The project proposes a multi-dimensional analysis framework:

  • Lifecycle Phase: Training time (data poisoning), inference time (jailbreak, prompt injection), across the entire lifecycle (hallucinations);
  • Attacker Intent: Adversarial (jailbreak, prompt injection), systemic (hallucinations);
  • Root Causes: Data integrity issues, instruction confusion, security filter weaknesses, probabilistic generation characteristics;
  • Main Impacts: Information leakage, harmful content generation, etc.;
  • Mitigation Strategies: Input filtering, output validation, adversarial training, etc.
5

Section 05

Core Insights: Vulnerability Essence and Defense Ideas

  • Common essence of jailbreak and prompt injection: Confusing the boundary between trusted instructions and user input;
  • Data poisoning characteristics: High concealment, persistence, vulnerabilities implanted before deployment;
  • Hallucinations: A byproduct of the probabilistic nature of LLMs, no malicious triggering required;
  • Defense requires deep defense: Data quality control, model alignment training, input validation, fact-checking, etc.
6

Section 06

Project Outputs, Value, and Limitations

Outputs: Final report, presentation slides, categorized literature directory; Academic Value: Integrate scattered research to lower entry barriers, student-friendly, provide theoretical foundation for subsequent research; Limitations: Literature review-focused with no experimental validation, simplified classification may omit edge vulnerabilities, no coverage of technical implementation details.

7

Section 07

Future Expansion Directions and Conclusion

Future Directions: Empirical research to quantify vulnerability impacts, develop automated detection tools, explore composite attack scenarios, establish standardized evaluation processes; Conclusion: This project provides a structured knowledge map for the LLM security field, helping to understand security challenges, and is a beneficial attempt at knowledge accumulation and educational dissemination.