Zing Forum

Reading

Network Flow Integration Enhances Generalization Capability of Machine Learning-based Intrusion Detection Systems

Supplementary material for the SBSeg 2026 conference paper, proposing a network flow integration method to enhance the generalization capability of machine learning-based intrusion detection systems (ML-IDS) in multi-domain data scenarios, using the NFStream tool to standardize multiple public datasets.

intrusion detectionIDSmachine learningnetwork securitygeneralizationNFStreamcybersecuritymulti-domain learningtraffic analysisSBSeg
Published 2026-06-10 07:14Recent activity 2026-06-10 07:22Estimated read 7 min
Network Flow Integration Enhances Generalization Capability of Machine Learning-based Intrusion Detection Systems
1

Section 01

[Introduction] Network Flow Integration Enhances Generalization Capability of ML-based Intrusion Detection Systems (SBSeg2026 Supplementary Material)

This article is supplementary material for the SBSeg 2026 conference paper. Its core proposal is to enhance the generalization capability of machine learning-based intrusion detection systems (ML-IDS) in multi-domain data scenarios through a network flow integration method. The study uses the NFStream tool to standardize multiple public datasets, addressing the problem that models trained on a single dataset struggle to adapt across different environments.

2

Section 02

Research Background and Motivation

With the growth of network threats and the evolution of attack methods, traditional rule-based IDS struggle to handle complex environments. ML technology brings new possibilities to IDS, but existing models have limitations in generalization capability: training on a single dataset easily leads to overfitting to specific network environments, feature spaces of different datasets are inconsistent, and performance degrades significantly when deployed cross-domain. This study aims to improve the cross-domain generalization capability of models by integrating flow-level features from multiple datasets.

3

Section 03

Core Method: Network Flow Integration Scheme

To address the limitations of traditional ML-IDS, the study proposes a network flow integration method, whose core components include: 1. Multi-dataset integration: uniformly processing network traffic data from different sources; 2. Feature standardization: using the NFStream tool to extract consistent flow-level features from raw PCAP files; 3. Construction of a common feature space: supporting cross-domain generalization experiments.

4

Section 04

Detailed Explanation of NFStream Feature Extraction Framework

The study uses NFStream to extract standardized features, which are divided into three categories:

  • Core Features: Flow identification information (source/destination IP, port, protocol), duration, number of packets/bytes;
  • Post-hoc Statistical Features: Packet length statistics (minimum/maximum/average/standard deviation), inter-arrival time statistics, traffic direction ratio;
  • Application Identification Features: Traffic category (normal/attack) and detailed classification of attack types. The standardization process lays the foundation for cross-domain learning.
5

Section 05

Description of Experimental Datasets

The study uses three public datasets:

  1. UNSW-NB15: Released by the Australian Centre for Cyber Security, containing 9 attack types, representing a mixed campus network environment;
  2. CIC-IDS2017: Released by the Canadian Institute for Cybersecurity, containing complete PCAP files and various modern attacks;
  3. CIC-IDS2018: An extension of the 2017 version, with a larger scale and more complex attack scenarios. Note: The original data is not redistributed due to license restrictions; it needs to be obtained from official sources and then processed using the workflow in this repository to generate the standardized version.
6

Section 06

Experimental Results and Key Findings

The experiments verify the effectiveness of the network flow integration method:

  1. Improved Generalization Capability: Models trained on multiple datasets show significantly better adaptability across different data domains;
  2. Performance Stability: The degree of performance degradation when deployed cross-domain is significantly reduced;
  3. Trade-off Relationship: There is an adjustable trade-off space between attack detection rate and false positive rate. These findings help security operation centers (SOCs) build more robust detection models.
7

Section 07

Application Prospects and Practical Value

The application scenarios of this research methodology include:

  • Security Operation Centers (SOCs): Integrate traffic from multiple clients to train general models, reducing retraining costs;
  • Cloud Security Service Providers: Build unified cross-tenant threat detection capabilities, improving the onboarding efficiency of new tenants;
  • Academic Research: Provide standardized experimental benchmarks to promote fair comparison of different methods.
8

Section 08

Summary and Insights

The network flow integration method marks a shift in the IDS field from "single-dataset optimization" to "cross-domain generalization". Through standardized feature extraction and joint training on multiple datasets, it provides a feasible path to improve model robustness. For cybersecurity practitioners, ML researchers, and AI security developers, this work provides valuable technical references and experimental foundations, and research on generalization capability will become increasingly important in the future.