# Taxonomy of Large Language Model Security Vulnerabilities: A Systematic Review of Four Core LLM Security Risks

> A structured research report for students and researchers that categorizes large language model security threats into four major types—jailbreak attacks, prompt injection, data poisoning, and hallucinations—and provides a unified risk analysis framework.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-07T21:13:52.000Z
- 最近活动: 2026-06-07T21:28:01.691Z
- 热度: 154.8
- 关键词: 大语言模型安全, 越狱攻击, 提示注入, 数据投毒, 幻觉, AI安全, 漏洞分类, 对抗性攻击, 模型安全, 纵深防御
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-ddbaba29
- Canonical: https://www.zingnex.cn/forum/thread/llm-ddbaba29
- Markdown 来源: floors_fallback

---

## [Introduction] Taxonomy of LLM Security Vulnerabilities: Four Core Risks and a Unified Analysis Framework

This project was completed by ketki1202 and Lalitha Sravanti Dast as the final project for the Fall 2025 LLM course, with the last update on June 7, 2026. It systematically reviews the four core security risks of large language models (LLMs)—jailbreak attacks, prompt injection, data poisoning, and hallucinations—and provides a unified risk analysis framework to help students and researchers understand the full landscape of LLM security threats.

## Project Background and Overview

With the widespread application of LLMs in critical business scenarios, their security issues have received increasing attention. However, existing security research is scattered and lacks systematic integration. This project aims to integrate fragmented research results into a unified, student-friendly framework, providing researchers and developers with a clear threat map. The project is sourced from GitHub with the original title Security-Taxonomy-of-Large-Language-Model-Vulnerabilities.

## Detailed Explanation of Four Core Security Vulnerability Categories

The project categorizes LLM security threats into four types:
1. **Jailbreak Attacks**: Bypass security filters during the inference phase to induce harmful content generation. Key features: occur during inference, exploit filter weaknesses, require adversarial prompt design;
2. **Prompt Injection**: Confuse trusted instructions with user input, leading to unauthorized operations or information leakage. Its logic is similar to traditional injection attacks;
3. **Data Poisoning**: Manipulate training data to implant harmful behaviors. It is persistent and difficult to detect and fix after deployment;
4. **Hallucinations**: Models generate false and unfounded outputs, stemming from probabilistic generation characteristics. No adversarial triggering is needed, which undermines reliability.

## Multi-Dimensional Unified Analysis Framework

The project proposes a multi-dimensional analysis framework:
- **Lifecycle Phase**: Training time (data poisoning), inference time (jailbreak, prompt injection), across the entire lifecycle (hallucinations);
- **Attacker Intent**: Adversarial (jailbreak, prompt injection), systemic (hallucinations);
- **Root Causes**: Data integrity issues, instruction confusion, security filter weaknesses, probabilistic generation characteristics;
- **Main Impacts**: Information leakage, harmful content generation, etc.;
- **Mitigation Strategies**: Input filtering, output validation, adversarial training, etc.

## Core Insights: Vulnerability Essence and Defense Ideas

- Common essence of jailbreak and prompt injection: Confusing the boundary between trusted instructions and user input;
- Data poisoning characteristics: High concealment, persistence, vulnerabilities implanted before deployment;
- Hallucinations: A byproduct of the probabilistic nature of LLMs, no malicious triggering required;
- Defense requires deep defense: Data quality control, model alignment training, input validation, fact-checking, etc.

## Project Outputs, Value, and Limitations

**Outputs**: Final report, presentation slides, categorized literature directory;
**Academic Value**: Integrate scattered research to lower entry barriers, student-friendly, provide theoretical foundation for subsequent research;
**Limitations**: Literature review-focused with no experimental validation, simplified classification may omit edge vulnerabilities, no coverage of technical implementation details.

## Future Expansion Directions and Conclusion

**Future Directions**: Empirical research to quantify vulnerability impacts, develop automated detection tools, explore composite attack scenarios, establish standardized evaluation processes;
**Conclusion**: This project provides a structured knowledge map for the LLM security field, helping to understand security challenges, and is a beneficial attempt at knowledge accumulation and educational dissemination.
