Reading

A Multi-Classification System Vulnerability Classification Method Based on Hierarchical Fine-Tuned Language Models

This paper proposes a new method for vulnerability classification in multi-classification systems using hierarchically fine-tuned language models. The method can simultaneously adapt to multiple vulnerability classification standards (such as CVE, CWE, CVSS), improve classification accuracy through hierarchical fine-tuning strategies, and provide a more intelligent solution for cybersecurity vulnerability management.

漏洞分类网络安全语言模型CWECVE微调多任务学习威胁情报

Published 2026-04-14 14:59Recent activity 2026-04-14 15:02Estimated read 9 min

Section 01

[Introduction] A Multi-Classification System Vulnerability Classification Method Based on Hierarchical Fine-Tuned Language Models

This paper proposes a new method for vulnerability classification in multi-classification systems using hierarchically fine-tuned language models. It can simultaneously adapt to multiple vulnerability classification standards such as CVE, CWE, and CVSS. By using hierarchical fine-tuning strategies, it improves classification accuracy and addresses issues like heterogeneity of multi-classification systems, sparse labels, and complex hierarchical structures, providing a more intelligent solution for cybersecurity vulnerability management.

Section 02

Importance of Vulnerability Classification and Existing Challenges

Core Roles of Vulnerability Classification

Risk Assessment: Determine the severity and potential impact of vulnerabilities, prioritize handling high-risk ones
Threat Intelligence: Standardized classification promotes information sharing and collaborative defense among organizations
Automated Response: Accurate classification is a prerequisite for automated scanning and remediation
Compliance Reporting: Meet regulatory requirements and provide structured security posture reports

Complexity of Existing Classification Systems

The cybersecurity field has multiple parallel classification systems (CVE, CWE, CVSS, CAPEC, ATT&CK), each with different focuses and complex mapping relationships. Traditional manual classification is time-consuming and labor-intensive, making it difficult to ensure consistency and accuracy.

Section 03

Detailed Explanation of the Hierarchical Fine-Tuned Language Model Method

Method Overview

Use the hierarchical structure of classification systems to guide language model fine-tuning, learn hierarchical relationships between categories, and achieve accurate multi-system classification.

Hierarchical Pre-training

Hierarchy-Aware Masked Language Model: Predict masked words while predicting their hierarchical categories
Cross-System Alignment: Use contrastive learning to bring related categories from different systems closer in the representation space

Progressive Fine-Tuning Strategy

Top-Down Fine-Tuning: Propagate from top to bottom layers, mastering coarse-grained classification first
Cross-System Knowledge Distillation: Use teacher models from data-rich systems to guide learning in sparse systems
Constraint-Aware Loss Function: Introduce hierarchical constraints to ensure predictions comply with the system's hierarchical structure

Multi-Task Learning Architecture

Shared Encoder: Encode vulnerability description semantics based on CodeBERT/SecureBERT
Classification System-Specific Heads: Set up dedicated classification layers for each system
Cross-System Attention: Allow information interaction between different classification heads to improve performance using system correlations

Section 04

Experimental Evaluation Results and Analysis

Dataset and Setup

Datasets: NVD (200,000+ CVE records), CVEFixes (vulnerability descriptions + fix code), Devign (manually labeled tags)
Evaluation Metrics: Classification accuracy, hierarchical consistency, cross-system consistency

Key Results

Single-System Performance: 2-5% higher accuracy than standard fine-tuning
Multi-System Joint Classification: Maintain performance across systems while reducing parameters and inference time
Hierarchical Consistency: Almost fully compliant with constraints (15% violation in standard methods)
Data Efficiency: More obvious advantages under sparse data, learning effective information from limited annotations

Section 05

Application Value and Industry Impact of the Method

Automated Vulnerability Management

Integrate into scanning/management systems to automatically classify from multiple dimensions, provide structured results and risk assessments, and reduce manual review

Threat Intelligence Enhancement

Multi-classification labeling enables platforms to provide richer structured information, allowing analysts to track CWE weakness trends or CAPEC attack techniques

Security Training and Knowledge Management

Hierarchical classification results serve as training materials to help developers understand vulnerability characteristics; structured knowledge bases facilitate the accumulation and management of security knowledge

Section 06

Current Limitations and Future Research Directions

Current Limitations

Evolution of Classification Systems: Need regular retraining to adapt to updates
Multilingual Support: Mainly for English, limited support for other languages
Code Context: Insufficient utilization of code information

Future Directions

Continuous Learning: Adapt to new vulnerability patterns and system changes without full retraining
Multimodal Fusion: Combine text, code, exploit videos, and other multimodal information
Causal Reasoning: Understand the causal mechanisms of vulnerabilities and predict derivative vulnerabilities
Adversarial Robustness: Improve robustness against adversarial samples that manipulate vulnerability descriptions

Section 07

Conclusion: Future Outlook for Intelligent Vulnerability Management

The hierarchical fine-tuned language model method proposed in this study effectively addresses the problem of vulnerability classification in multi-classification systems, achieving significant progress in accuracy, consistency, and efficiency. As software complexity and security threats evolve, intelligent vulnerability management becomes increasingly important. We look forward to this research promoting the deep integration of cybersecurity and artificial intelligence, helping to build a safer digital world.