# NDSS 2026: Practice and Hallucination Suppression of Lightweight Large Language Models in Security Incident Response

> This post introduces the open-source project accompanying the NDSS 2026 accepted paper, proposing a lightweight LLM-based decision support method for security incident response. It addresses large model hallucination issues through fine-tuning, information retrieval, and decision-theoretic planning, can run on ordinary hardware, and publicly releases the first security incident response fine-tuning dataset.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-02T11:45:14.000Z
- 最近活动: 2026-06-02T11:50:58.166Z
- 热度: 162.9
- 关键词: 网络安全, 事件响应, 大语言模型, LLM, 模型幻觉, 微调, DeepSeek, Qwen, NDSS, 安全运营, SOC, 开源数据集, 决策支持, 生成式AI
- 页面链接: https://www.zingnex.cn/en/forum/thread/ndss-2026
- Canonical: https://www.zingnex.cn/forum/thread/ndss-2026
- Markdown 来源: floors_fallback

---

## NDSS 2026 Paper Introduction: Practice and Hallucination Suppression of Lightweight LLMs in Security Incident Response

This post introduces the open-source project accompanying the NDSS 2026 accepted paper (GitHub repo: Kim-Hammar/llm_incident_response_ndss26, released on 2026-06-02). The core content is a lightweight LLM-based decision support method for security incident response, which solves model hallucination through domain-specific fine-tuning, information retrieval enhancement, and decision-theoretic planning. It can run on ordinary hardware and publicly releases the first security incident response fine-tuning dataset. Keywords: Cybersecurity, Incident Response, LLM, Model Hallucination, Fine-tuning, DeepSeek, Qwen, NDSS, Open-source Dataset, Decision Support

## Research Background and Problem Definition

Traditional security incident response relies on manual experience, which is inefficient and error-prone. While LLMs offer possibilities for automated decision-making, they face core challenges: model hallucination (generating seemingly reasonable but incorrect responses); existing solutions rely on prompt engineering for cutting-edge large models, which are costly and difficult to deploy on ordinary hardware. The team from the University of Melbourne and Imperial College London proposed a lightweight LLM approach to address these issues.

## Core Innovations and Technical Approaches

This method solves the problem through three key technologies: 1. Domain-specific fine-tuning: Build the first public security incident response fine-tuning dataset, fine-tune the DeepSeek-R1-Distill-Qwen-14B model to master professional knowledge and decision-making patterns; 2. Information retrieval enhancement: Retrieve similar historical cases and best practices to reduce hallucination; 3. Decision-theoretic planning: Evaluate the long-term impact of action sequences to ensure response feasibility. After optimization, it can run on ordinary consumer-grade GPUs, lowering the deployment threshold.

## System Architecture and Workflow

The system has three stages: The input stage receives system logs, alerts, and other data, preprocessing them into structured formats; The processing stage: The fine-tuned LLM obtains background knowledge through the retrieval module, infers combined with input, and the decision-theoretic planning module selects the optimal response sequence; The output stage generates a structured response plan (including specific actions, execution reasons, expected effects) for analysts to understand or automated processing.

## Detailed Explanation of Open-source Resources

The project open-sources complete artifacts: 1. Fine-tuning dataset: The first public one, hosted on Hugging Face, containing real scenarios and expert-annotated responses covering multiple attack types; 2. Model weights: Based on DeepSeek-R1-Distill-Qwen-14B, using LoRA parameter-efficient fine-tuning with reasonable file size; 3. Reproducible code: Python library with functions for dataset loading, fine-tuning, response generation, etc., supporting macOS/Ubuntu and compatible with Python 3.8-3.13.

## Experimental Validation and Performance Evaluation

Experiments verify effectiveness: 1. Hallucination suppression: Hallucination rate is significantly reduced after fine-tuning, and information retrieval further improves accuracy; 2. Response quality: Expert blind evaluation shows it is close to human expert level, with excellent performance in responding to common attacks; 3. Efficiency: Single response generation time is in seconds on RTX8000, and it runs stably on 16GB VRAM devices, meeting real-time requirements.

## Application Scenarios, Limitations, and Future Directions

Application scenarios: SOC auxiliary decision-making (shortening response time), security training (as learning materials), SOAR automation orchestration (integrating with platforms). Limitations: Limited coverage of attack types, mainly optimized for English, large-scale concurrency latency issues. Future work: Expand the dataset (add new attacks/industry scenarios), multi-language support, model quantization and inference acceleration.
