# Dual-Layer Adaptive Cybersecurity Detection System: Using NLP and Machine Learning to Combat Evolving Social Engineering Attacks

> This article introduces a dual-layer adaptive system combining a random forest classifier and an expert rule engine, which can classify emails, SMS, and chat messages into 7 types of social engineering attacks with an accuracy rate of 98.18%

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-22T17:45:32.000Z
- 最近活动: 2026-05-22T17:49:44.707Z
- 热度: 152.9
- 关键词: cybersecurity, NLP, machine-learning, phishing-detection, social-engineering, random-forest, adaptive-learning, text-classification, fraud-detection
- 页面链接: https://www.zingnex.cn/en/forum/thread/nlp-8e850dbc
- Canonical: https://www.zingnex.cn/forum/thread/nlp-8e850dbc
- Markdown 来源: floors_fallback

---

## Core Guide to the Dual-Layer Adaptive Cybersecurity Detection System

This article introduces the dual-layer adaptive cybersecurity detection system developed by the Usha Martin University team, which combines a random forest classifier and an expert rule engine. It can identify 7 types of social engineering attacks in emails, SMS, and other messages with an accuracy rate of 98.18%. The system has adaptive learning capabilities and can be continuously updated to respond to evolving attack patterns, providing a practical and interpretable solution for social engineering attack detection.

## Project Background and Motivation

In today's digital society, social engineering attack methods are constantly evolving (e.g., using psychological manipulation such as a sense of urgency or authority). Traditional detection systems based on rules or single machine learning models struggle to keep up. Therefore, the team developed this dual-layer adaptive system to address the challenge of rapidly changing attack patterns while achieving high accuracy and adaptive capabilities.

## System Architecture: Dual-Layer Collaborative Detection Mechanism

The core of the system is a dual-layer architecture:
1. **Random Forest Classifier**: Trained on 197,909 samples, identifies 7 attack types (Safe, Phishing, Urgency Manipulation, Authority Impersonation, Financial Fraud, Malware/Suspicious Links, Credential Theft) with an accuracy rate of 98.18%.
2. **Nine-Rule Expert Engine**: Captures subtle attacks missed by ML. The rule table is as follows:
| Rule Name | Detection Target | Severity | Score |
|---------|---------|---------|------|
| Impersonation | Authority Impersonation | CRITICAL |85|
| Credential Theft | Urgent Request for OTP/Password |CRITICAL|90|
| Urgency Escalation | Time Pressure + Threat Combination |HIGH|70|
| Context Attack | Financial/Legal/Health/Work Bait |HIGH|65-70|
| Subtle Manipulation | Flattery, False Intimacy, Scarcity |MEDIUM-HIGH|20-60|
| Obfuscation | Distorted Text, Invisible Characters, URLs |HIGH-CRITICAL|65-90|
| Mixed Signal | Trust Vocabulary + Attack Vocabulary Combination |HIGH|70|
| Malware Install | Inducing App Installation / Fake Delivery Failure |HIGH|72|
| Safe Signals | Authentic Communication Patterns |LOW|-30|
**Decision Fusion**: Threat Score = ML Score ×50% + Rule Engine Score ×50%. CRITICAL rules can override ML predictions.

## Feature Engineering: Multi-Dimensional Text Understanding

The system constructs a 10,012-dimensional feature vector:
- **TF-IDF Features (10,000 dimensions)**: Implemented with scikit-learn, extracts unigrams/bigrams, and uses sublinear_tf to reduce the impact of high-frequency words.
- **Psychological Manipulation Features (8 dimensions)**: Includes credential phrases, authority markers, link urgency, etc., scaled by ×10 to enhance weight.
- **Attack Pattern Features (4 dimensions)**: Identifies technical methods such as obfuscation, mixed signals, subtle manipulation, and context attacks.

## Adaptive Learning Mechanism: Combating Attack Evolution

To address attack evolution, the system is designed with an adaptive mechanism:
- **Data Segmentation**: 197,909 samples are divided into T1 (training, 70%), T2a (drift set, simulating new attacks,15%), and T2b (testing,15%).
- **Experimental Results**: Performance improvement after adaptation:
| Metric | Before Adaptation (T1→T2b) | After Adaptation (T1+T2a→T2b) | Improvement |
|----|------------------|----------------------|----|
| Accuracy |97.65%|98.18%|+0.53%|
| Precision |97.64%|98.18%|+0.54%|
| Recall |97.65%|98.18%|+0.53%|
| F1 Score |0.9763|0.9818|+0.0055|
This proves the system can maintain detection timeliness by learning new samples.

## Technical Implementation and Application Value

**Tech Stack**: Python3.10+, scikit-learn1.8.0, NLTK3.9.4, Streamlit1.35+, pandas/NumPy.
**Preprocessing Flow**: Raw message → lowercase → URL marking → stopword filtering → lemmatization → feature extraction.
**Web Deployment**: Streamlit interface provides real-time threat score, risk level, attack category, detection evidence, and security recommendations.
**Dataset**: Integrates multi-source data such as CEAS_08, phishing_email, Enron, covering channels like emails and SMS.
**Insights**: Hybrid architecture (ML + rules) complements advantages; rule engine improves interpretability; adaptive capability is key to combating evolving attacks.

## Future Directions and Conclusion

**Future Directions**: Integrate more Hugging Face phishing datasets, explore deep learning models like BERT, develop a real-time streaming version.
**Conclusion**: Through carefully designed feature engineering, dual-layer architecture, and adaptive mechanism, the system builds a practical, interpretable, and evolvable detection platform, providing a reference case for security teams.
**Project Information**: Authors Mohammad Kaif et al., Institution Usha Martin University, License MIT, Code Repository: <https://github.com/kaif0102/Adaptive-Detection-of-Evolving-Language-Based-Cyber-Attacks>.
