Zing Forum

Reading

NDSS 2026: Practice and Hallucination Suppression of Lightweight Large Language Models in Security Incident Response

This post introduces the open-source project accompanying the NDSS 2026 accepted paper, proposing a lightweight LLM-based decision support method for security incident response. It addresses large model hallucination issues through fine-tuning, information retrieval, and decision-theoretic planning, can run on ordinary hardware, and publicly releases the first security incident response fine-tuning dataset.

网络安全事件响应大语言模型LLM模型幻觉微调DeepSeekQwenNDSS安全运营
Published 2026-06-02 19:45Recent activity 2026-06-02 19:50Estimated read 7 min
NDSS 2026: Practice and Hallucination Suppression of Lightweight Large Language Models in Security Incident Response
1

Section 01

NDSS 2026 Paper Introduction: Practice and Hallucination Suppression of Lightweight LLMs in Security Incident Response

This post introduces the open-source project accompanying the NDSS 2026 accepted paper (GitHub repo: Kim-Hammar/llm_incident_response_ndss26, released on 2026-06-02). The core content is a lightweight LLM-based decision support method for security incident response, which solves model hallucination through domain-specific fine-tuning, information retrieval enhancement, and decision-theoretic planning. It can run on ordinary hardware and publicly releases the first security incident response fine-tuning dataset. Keywords: Cybersecurity, Incident Response, LLM, Model Hallucination, Fine-tuning, DeepSeek, Qwen, NDSS, Open-source Dataset, Decision Support

2

Section 02

Research Background and Problem Definition

Traditional security incident response relies on manual experience, which is inefficient and error-prone. While LLMs offer possibilities for automated decision-making, they face core challenges: model hallucination (generating seemingly reasonable but incorrect responses); existing solutions rely on prompt engineering for cutting-edge large models, which are costly and difficult to deploy on ordinary hardware. The team from the University of Melbourne and Imperial College London proposed a lightweight LLM approach to address these issues.

3

Section 03

Core Innovations and Technical Approaches

This method solves the problem through three key technologies: 1. Domain-specific fine-tuning: Build the first public security incident response fine-tuning dataset, fine-tune the DeepSeek-R1-Distill-Qwen-14B model to master professional knowledge and decision-making patterns; 2. Information retrieval enhancement: Retrieve similar historical cases and best practices to reduce hallucination; 3. Decision-theoretic planning: Evaluate the long-term impact of action sequences to ensure response feasibility. After optimization, it can run on ordinary consumer-grade GPUs, lowering the deployment threshold.

4

Section 04

System Architecture and Workflow

The system has three stages: The input stage receives system logs, alerts, and other data, preprocessing them into structured formats; The processing stage: The fine-tuned LLM obtains background knowledge through the retrieval module, infers combined with input, and the decision-theoretic planning module selects the optimal response sequence; The output stage generates a structured response plan (including specific actions, execution reasons, expected effects) for analysts to understand or automated processing.

5

Section 05

Detailed Explanation of Open-source Resources

The project open-sources complete artifacts: 1. Fine-tuning dataset: The first public one, hosted on Hugging Face, containing real scenarios and expert-annotated responses covering multiple attack types; 2. Model weights: Based on DeepSeek-R1-Distill-Qwen-14B, using LoRA parameter-efficient fine-tuning with reasonable file size; 3. Reproducible code: Python library with functions for dataset loading, fine-tuning, response generation, etc., supporting macOS/Ubuntu and compatible with Python 3.8-3.13.

6

Section 06

Experimental Validation and Performance Evaluation

Experiments verify effectiveness: 1. Hallucination suppression: Hallucination rate is significantly reduced after fine-tuning, and information retrieval further improves accuracy; 2. Response quality: Expert blind evaluation shows it is close to human expert level, with excellent performance in responding to common attacks; 3. Efficiency: Single response generation time is in seconds on RTX8000, and it runs stably on 16GB VRAM devices, meeting real-time requirements.

7

Section 07

Application Scenarios, Limitations, and Future Directions

Application scenarios: SOC auxiliary decision-making (shortening response time), security training (as learning materials), SOAR automation orchestration (integrating with platforms). Limitations: Limited coverage of attack types, mainly optimized for English, large-scale concurrency latency issues. Future work: Expand the dataset (add new attacks/industry scenarios), multi-language support, model quantization and inference acceleration.