Reading

End-to-End Network Intrusion Detection System: A Complete Machine Learning Practice with 18 Model Configurations

This article introduces a complete network intrusion detection system that achieves binary classification detection through traffic feature engineering, noise injection, and comparison of 18 model configurations, with the best model achieving an F1 score of 0.9092.

入侵检测机器学习网络安全流量分析神经网络特征工程噪声注入二分类

Published 2026-05-22 01:15Recent activity 2026-05-22 01:18Estimated read 5 min

End-to-End Network Intrusion Detection System: A Complete Machine Learning Practice with 18 Model Configurations

Section 01

[Introduction] End-to-End Network Intrusion Detection System Practice: 18 Model Comparisons and Key Results

The CS-324 Machine Learning course team at FAST-NUCES University developed an end-to-end network intrusion detection system covering the entire process from data collection to model deployment. Through traffic feature engineering, innovative noise injection strategies, and comparison of 18 model configurations (including three categories: logistic regression, decision trees, and neural networks), the best model achieved an F1 score of 0.9092, providing a practical example for the application of machine learning in the field of network security.

Section 02

Project Background and Core Objectives

Network intrusion detection is essentially a binary classification problem (normal/attack traffic), with the core goal of high recall (the cost of missed detection is far higher than that of false positives). The project dataset contains 11,051 traffic samples and 9 engineered features, comparing 18 model configurations (three algorithm families, two data split ratios: 70/15/15 and 80/10/10).

Section 03

Data Collection and Feature Engineering

The data was collected from a controlled laboratory using Wireshark/Tshark; normal traffic includes Google Meet, HTTPS, etc., while attack traffic was generated using Kali tools (SYN Flood, nmap scans, etc.). Nine key features were extracted: total number of packets, flow duration, average packet length, etc., with a balanced class distribution (47.2% normal, 52.8% attack).

Section 04

Data Preprocessing and Noise Injection Strategy

Three-stage preprocessing: cleaning missing values, removing data leakage features, and eliminating highly correlated features; innovative mutual information ratio noise injection (injecting Gaussian noise based on the correlation between features and labels, plus 5% label flipping); stratified sampling for data splitting, and StandardScaler only fitted on the training set to avoid leakage.

Section 05

Model Architecture and Training Configuration

The 18 model configurations cover three categories: logistic regression (basic, L1, L2), decision trees (basic, random forest, XGBoost/LightGBM), and neural networks (conservative, balanced, aggressive). Each model was trained under two data splits, with a fixed random seed of 42, and evaluation metrics include accuracy, recall, F1, etc.

Section 06

Visualization Analysis and Best Model Results

Visualization aids such as ROC/PR curves, confusion matrices, and feature importance plots were used for diagnosis. The best model was the aggressive neural network (80/10/10 split), with F1=0.9092 and AUC=0.9377; the best F1 scores for logistic regression and decision trees exceeded 0.85, and noise injection effectively avoided overfitting.

Section 07

Limitations and Future Work Recommendations

Limitations: small dataset size (about 10,000 samples), limited attack types (flooding/scanning attacks). Future directions: introduce more attack types and network topologies; try Transformer to process raw data packets; explore online learning; conduct deployment tests in real environments.

Section 08

Summary and Insights

The project provides an excellent example for security ML applications, with key insights: high-quality feature engineering is the foundation; noise injection and strict data splitting improve generalization ability; multi-model comparison helps find the optimal solution; visualization analysis is an important tool for model diagnosis, providing learners with a complete problem-solving framework.

Continue Reading

Keep going with more reads from the same topic.

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

SignalCut is an innovative web application that analyzes brands' visibility gaps in AI search, automatically generates evidence-based marketing strategies, and creates Hera video materials, helping early-stage brands gain a competitive edge in the AI answer engine era.

Recent activity 2026-04-26 11:27

AWS Open-Sources AI Search Citation Analysis System: Track Brand Exposure in AI Search Engines

An open-source project officially released by AWS, built on Amazon Bedrock, Step Functions, and React to form a complete serverless citation analysis system. It helps enterprises monitor their brand's citation status and competitive landscape in AI searches like ChatGPT, Perplexity, Gemini, and Claude.

Recent activity 2026-03-31 20:49

Next.js Application SEO and GEO Integrated Optimization Solution: Comprehensive Visibility from Search Engines to AI Assistants

This article delves into the stevewerme/seo-geo-nextjs project, an open-source tool designed specifically for Next.js applications to simultaneously optimize traditional search engine rankings (SEO) and generative engine visibility (GEO). It analyzes the project's core architecture, implementation mechanisms, practical application scenarios, and its strategic significance for developers and content creators.

Recent activity 2026-04-03 14:48

Baiyuan GEO Platform Technical White Paper: SaaS Engineering Practice for Generative Engine Optimization (GEO)

This article deeply analyzes the GEO Platform technical white paper developed by Baiyuan Technology, covering the seven-dimensional AI citation rate scoring algorithm, AXP shadow document delivery mechanism, Schema.org three-layer entity knowledge graph, and the hallucination automatic detection and repair closed-loop system, providing an engineering solution for brands to gain visibility in generative AI such as ChatGPT and Claude.

Recent activity 2026-04-18 22:54