Zing Forum

Reading

End-to-End Network Intrusion Detection System: A Complete Machine Learning Practice with 18 Model Configurations

This article introduces a complete network intrusion detection system that achieves binary classification detection through traffic feature engineering, noise injection, and comparison of 18 model configurations, with the best model achieving an F1 score of 0.9092.

入侵检测机器学习网络安全流量分析神经网络特征工程噪声注入二分类
Published 2026-05-22 01:15Recent activity 2026-05-22 01:18Estimated read 5 min
End-to-End Network Intrusion Detection System: A Complete Machine Learning Practice with 18 Model Configurations
1

Section 01

[Introduction] End-to-End Network Intrusion Detection System Practice: 18 Model Comparisons and Key Results

The CS-324 Machine Learning course team at FAST-NUCES University developed an end-to-end network intrusion detection system covering the entire process from data collection to model deployment. Through traffic feature engineering, innovative noise injection strategies, and comparison of 18 model configurations (including three categories: logistic regression, decision trees, and neural networks), the best model achieved an F1 score of 0.9092, providing a practical example for the application of machine learning in the field of network security.

2

Section 02

Project Background and Core Objectives

Network intrusion detection is essentially a binary classification problem (normal/attack traffic), with the core goal of high recall (the cost of missed detection is far higher than that of false positives). The project dataset contains 11,051 traffic samples and 9 engineered features, comparing 18 model configurations (three algorithm families, two data split ratios: 70/15/15 and 80/10/10).

3

Section 03

Data Collection and Feature Engineering

The data was collected from a controlled laboratory using Wireshark/Tshark; normal traffic includes Google Meet, HTTPS, etc., while attack traffic was generated using Kali tools (SYN Flood, nmap scans, etc.). Nine key features were extracted: total number of packets, flow duration, average packet length, etc., with a balanced class distribution (47.2% normal, 52.8% attack).

4

Section 04

Data Preprocessing and Noise Injection Strategy

Three-stage preprocessing: cleaning missing values, removing data leakage features, and eliminating highly correlated features; innovative mutual information ratio noise injection (injecting Gaussian noise based on the correlation between features and labels, plus 5% label flipping); stratified sampling for data splitting, and StandardScaler only fitted on the training set to avoid leakage.

5

Section 05

Model Architecture and Training Configuration

The 18 model configurations cover three categories: logistic regression (basic, L1, L2), decision trees (basic, random forest, XGBoost/LightGBM), and neural networks (conservative, balanced, aggressive). Each model was trained under two data splits, with a fixed random seed of 42, and evaluation metrics include accuracy, recall, F1, etc.

6

Section 06

Visualization Analysis and Best Model Results

Visualization aids such as ROC/PR curves, confusion matrices, and feature importance plots were used for diagnosis. The best model was the aggressive neural network (80/10/10 split), with F1=0.9092 and AUC=0.9377; the best F1 scores for logistic regression and decision trees exceeded 0.85, and noise injection effectively avoided overfitting.

7

Section 07

Limitations and Future Work Recommendations

Limitations: small dataset size (about 10,000 samples), limited attack types (flooding/scanning attacks). Future directions: introduce more attack types and network topologies; try Transformer to process raw data packets; explore online learning; conduct deployment tests in real environments.

8

Section 08

Summary and Insights

The project provides an excellent example for security ML applications, with key insights: high-quality feature engineering is the foundation; noise injection and strict data splitting improve generalization ability; multi-model comparison helps find the optimal solution; visualization analysis is an important tool for model diagnosis, providing learners with a complete problem-solving framework.