Reading

DeepSeek Large Language Model Security Audit System: A Graduation Thesis-level AI Security Testing Framework

This article introduces an automated security audit system for large language models (LLMs), which includes 27 attack vectors, over 80 test prompts, multilingual support, and intelligent analysis functions, providing a comprehensive methodology and tool implementation for LLM security assessment.

LLM security安全审计DeepSeek提示词注入AI安全对抗性攻击自动化测试大语言模型漏洞评估

Published 2026-05-27 00:11Recent activity 2026-05-27 00:25Estimated read 9 min

Section 01

DeepSeek Large Language Model Security Audit System: A Graduation Thesis-level AI Security Testing Framework (Introduction)

Core Overview

Basic Information

Original Author/Maintainer: aleksa-ai-cybersec (Vorobeva Aleksandra)
Source Platform: GitHub
Release Date: May 26, 2026
Affiliated Institution: Moscow State Linguistic University, Institute of Information Science, Department of International Information Security
Original Link: https://github.com/aleksa-ai-cybersec/deepseek-audit-diploma

Section 02

Research Background and Motivation

LLM Security Challenges

While large language models are integrated into various industries, security risks are prominent:

Prompt Injection Attacks: Malicious inputs bypass security restrictions
Sensitive Information Leakage: Privacy leakage of training data
Harmful Content Generation: Discriminatory, violent, or illegal content
Hallucination Issue: Generating factually incorrect content
Adversarial Attacks: Minor input perturbations leading to drastic output changes

Existing Assessment Gaps

Current industry assessments are superficial, lacking systematicity and depth, making it difficult to cover real attack scenarios. As a graduation thesis project, this aims to develop a comprehensive automated LLM security audit methodology, with DeepSeek as a case study for empirical research.

Section 03

System Architecture and Core Functions

Attack Vector Coverage

27 attack vectors covering the entire ML lifecycle:

Training Phase: Data Poisoning, Backdoor Implantation, Model Stealing
Inference Phase: Prompt Injection, Jailbreak Attacks, Role-playing Bypass
Output Phase: Information Extraction, Hallucination Induction, Harmful Content Generation

Test Prompt Library

Over 80 carefully designed test prompts covering:

Direct Attacks, Indirect Attacks, Encoding Attacks (Base64/ROT13), Multilingual Attacks

STRIDE-AI Classification Framework

Extended STRIDE threat model adapted for AI systems: S (Spoofing), T (Tampering), R (Repudiation), I (Information Disclosure), D (Denial of Service), E (Elevation of Privilege) Each attack is mapped to the corresponding category for structured analysis.

Section 04

Intelligent Analysis and Advanced Functions

Intelligent Analysis Features

Semantic Analysis: Rejection Detection, Information Leakage Identification, Evasion Behavior Analysis
Sentiment Analysis: Sentiment Polarity Evaluation of Response Content
Multilingual Support: Testing in 5 languages (Russian/English/Chinese/French/German)
Hallucination Detector: Identifies factual errors and contradictions
Time Series Analysis: Tracks attack success rate trends

Advanced Functions

Adaptive Testing: Entropy-based Selection, Bayesian Estimation, Auto-stop Mechanism
Attack Pattern Library: Analyzes successful cases to generate new test variants
Confidence Assessment: Wilson method to calculate 95% confidence intervals

Anti-Detection Mechanisms

Token Pool Rotation: 6 GitHub tokens with 900 daily request quota
Request Camouflage: User-Agent Rotation (14 types), Random Delays, Simulating Human Behavior

Section 05

Technical Implementation and Visualization Reports

Core Dependencies

Python 3.10+, pandas, numpy, plotly, scipy, tqdm, gradio, requests, langdetect

Deployment Methods

Local Run, Streamlit Cloud Deployment, GitHub Pages Static Display

Visualization and Reports

7 Interactive Charts: Attack Success Rate Trends, Vulnerability Distribution Heatmap, etc.
Real-time Dashboard: Real-time monitoring of test process
Telegram Notifications: Automatic push of key findings
Automatic Reports: Generates academic-standard reports with dynamic risk assessment tables

Section 06

Project Value and Industry Significance

Academic Contributions

Systematic LLM Security Audit Methodology
Reproducible Testing Framework
Empirical Research Results

Practical Value

Model Developers: Identify vulnerabilities to improve design
Enterprise Users: Evaluate third-party LLM security
Regulatory Bodies: Establish security assessment standards
Researchers: Provide benchmark testing tools

Industry Impact

Aligns with regulatory requirements like the EU AI Act, promoting LLM security compliance and establishment of industry standards.

Section 07

Limitations and Future Directions

Current Limitations

Mainly targeted at DeepSeek models; generalizability needs verification
Relies on GitHub API, subject to platform policy restrictions
Test coverage cannot exhaust all attack variants

Future Improvements

Expand to more LLM platforms
Integrate adversarial attack generation (e.g., AutoPrompt)
Add red team adversarial exercise functions
Develop standardized assessment benchmarks

Section 08

Conclusion

DeepSeek Audit Diploma represents cutting-edge practice in LLM security research. Through systematic attack design, intelligent analysis, and engineering implementation, it provides tools and references for AI security assessment.

As LLM capabilities improve, the importance of security auditing becomes increasingly prominent. Such research is a necessary foundation for the responsible development of AI security, and we look forward to more researchers jointly building a secure AI ecosystem.