Zing Forum

Reading

Cybersecurity Intelligent Assistant: Practice of a Dual-Model Dialogue System Based on RAG Architecture

This project builds a Retrieval-Augmented Generation (RAG) cybersecurity chatbot, which automatically distributes queries between LLaMA 3.1 and DeepSeek-Coder via an intelligent routing mechanism, providing professional support for security testing and CTF competitions.

cybersecurityRAGLLaMADeepSeeklarge language modelschatbotpenetration testingCTFFAISSLoRA
Published 2026-05-14 14:54Recent activity 2026-05-14 15:05Estimated read 7 min
Cybersecurity Intelligent Assistant: Practice of a Dual-Model Dialogue System Based on RAG Architecture
1

Section 01

[Main Floor/Introduction] Cybersecurity Intelligent Assistant: Practice of a Dual-Model Dialogue System Based on RAG Architecture

This project builds a cybersecurity intelligent dialogue assistant based on the Retrieval-Augmented Generation (RAG) architecture. It automatically distributes queries between the dual models LLaMA 3.1 and DeepSeek-Coder via an intelligent routing mechanism, providing professional support for security testing and CTF competitions. The project integrates technologies such as domain knowledge base, efficient LoRA parameter fine-tuning, and Docker containerized deployment to achieve accurate and real-time technical responses, while emphasizing privacy protection (local knowledge base processing). This article will introduce the project background, architecture design, technical details, and application value in separate floors.

2

Section 02

Project Background and Design Ideas

The cybersecurity field has fast-updating knowledge and high professionalism. Traditional general AI assistants have pain points such as insufficient professional knowledge reserves and inability to handle code and conceptual issues in a targeted manner. The core design of this project is 'specialization' and 'intelligent routing': instead of using a single model, it selects the optimal model according to the query type through intelligent routing, balancing professionalism and resource efficiency.

3

Section 03

Dual-Model Architecture and Intelligent Routing Mechanism

The project adopts dual-model collaboration: LLaMA 3.1 8B Instruct handles conceptual questions (such as SQL injection explanation, Nmap configuration), while DeepSeek-Coder 6.7B focuses on code-related queries (exploit scripts, PoC code). The intelligent routing is implemented by the RAG_Router module, which classifies queries through keyword matching and syntax analysis—code-related queries are routed to DeepSeek-Coder, conceptual ones to LLaMA; identity queries (e.g., 'Who are you?') use hard-coded responses to save resources.

4

Section 04

RAG Architecture and Customized Knowledge Base

The project is based on the RAG architecture: before generating an answer, it retrieves relevant information from an external knowledge base. The knowledge base is a customized cybersecurity guide (penetration testing, CTF skills, macOS 15+ command tools), which is converted into vectors via SentenceTransformers and stored in a FAISS database. When a user asks a question, the system encodes the question into a vector, performs a similarity search in FAISS to obtain relevant fragments, and uses them as context input to the model to generate an accurate answer.

5

Section 05

LoRA Fine-Tuning and Deployment Plan

To adapt to domain requirements, the project uses LoRA technology for model fine-tuning: it introduces low-rank matrices to update a small number of parameters, retaining the original capabilities while learning domain knowledge. The LoRA adapter is stored in the outputs directory and can be dynamically loaded. For deployment, it provides a FastAPI web interface (supporting remote SSH port forwarding) and a Docker containerized one-click startup (Python 3.10 environment to ensure compatibility).

6

Section 06

Privacy Protection and Application Scenario Value

For privacy protection, all operations are completed locally (knowledge base stored locally, no external API transmission); accessing gated models (such as LLaMA) requires Hugging Face Token authorization. Application scenarios include: student learning tutoring, CTF competition problem-solving support, and penetration testing technical reference. This 'general model + domain customization' paradigm has reference significance for AI applications in fields such as law and medicine.

7

Section 07

Summary of Technical Highlights and Project Value

Technical highlights of the project: dual-model intelligent routing, complete RAG architecture, efficient LoRA parameter fine-tuning, Docker one-click deployment, and macOS customized knowledge base. Project value: providing a practical assistant for cybersecurity practitioners, a reference implementation for AI developers, and an example of an RAG system for researchers. In the future, domain-specific AI assistants will play a role in more scenarios.