Reading

Machine Learning-Based Network Intrusion Detection System: From NumPy Implementation to Explainable AI

This article provides an in-depth analysis of an open-source network anomaly detection project. The project uses the CICIDS 2017 dataset to compare six machine learning models, including logistic regression and MLP neural networks implemented from scratch with NumPy. It integrates SMOTE for data balancing and SHAP for explainability analysis, offering a complete IEEE-standard evaluation scheme for the cybersecurity domain.

网络入侵检测异常检测机器学习CICIDS 2017SMOTESHAP可解释AI逻辑回归神经网络网络安全

Published 2026-05-01 21:43Recent activity 2026-05-01 21:54Estimated read 6 min

Machine Learning-Based Network Intrusion Detection System: From NumPy Implementation to Explainable AI

Section 01

Introduction: A Complete Solution for Machine Learning-Based Network Intrusion Detection System

The open-source project introduced in this article presents a complete network intrusion detection solution: it uses the CICIDS 2017 dataset to compare six machine learning models (including logistic regression and MLP neural networks implemented from scratch with NumPy), applies SMOTE to handle data imbalance issues, and achieves explainability analysis via SHAP. Meanwhile, it follows the IEEE-standard evaluation system, providing a reproducible reference for AI applications in the cybersecurity field.

Section 02

Background and Dataset Selection

Challenges in AI Transformation for Cybersecurity

Traditional rule-based Intrusion Detection Systems (IDS) struggle to cope with evolving attack methods and zero-day vulnerabilities. Machine learning-based anomaly detection techniques offer new ideas to address this dilemma.

Details of the CICIDS 2017 Dataset

The project uses the CICIDS 2017 dataset released by the Canadian Institute for Cybersecurity, which contains one week of real network traffic, covering normal traffic and various attack types such as DoS/DDoS, port scanning, brute force attacks, web attacks, and penetration attacks. It provides over 80 traffic features (e.g., flow duration, packet length statistics, etc.).

Section 03

Model Comparison and Data Imbalance Handling

Comparison of Six Models

The project compares six machine learning models: logistic regression (implemented from scratch with NumPy to understand mechanisms like gradient descent and Sigmoid activation), MLP neural network (manually implemented with NumPy for forward/backward propagation), decision tree, random forest, support vector machine, and gradient boosting tree—forming a complete spectrum from linear to ensemble methods.

Data Imbalance Solution

To address the problem of extremely high proportions of normal samples in network traffic, SMOTE technology is used to generate synthetic samples for minority classes (attacks), improving the model's detection sensitivity to rare attacks.

Section 04

Explainable AI and Detection Task Design

Explainable AI: Application of SHAP Values

To solve the model black-box problem, SHAP values are introduced:

Quantify the contribution of features to predictions (positive values promote attack determination, negative values suppress it);
Provide local explanations for individual instances;
Visualize feature impacts via summary plots and force plots.

Detection Task Design

Two modes are supported:

Binary classification: Normal vs. Attack (for quick alerts);
Multi-class classification: Distinguish attack types (e.g., SQL injection, DDoS) to support refined responses.

Section 05

Evaluation System and Engineering Practical Value

IEEE-Standard Evaluation System

Following IEEE specifications, comprehensive metrics are used: accuracy, precision, recall, F1 score, confusion matrix, ROC curve, and AUC—ensuring results are credible and reproducible.

Engineering Practical Value

Educational significance: Implementing models from scratch deepens understanding of algorithms;
End-to-end process: Covers the entire link from data preprocessing to model interpretation;
Open-source collaboration: Allows the community to reproduce and improve, driving progress in the field.

Section 06

Summary and Future Directions

Network intrusion detection is a typical application of machine learning in the security field. This project demonstrates a complete technical path from data to deployment. In the future, AI-based detection systems will become more important, and we need to continuously explore the balance between explainability, robustness, and real-time performance.