Zing Forum

Reading

A Multi-Classification System Vulnerability Classification Method Based on Hierarchical Fine-Tuned Language Models

This paper proposes a new method for vulnerability classification in multi-classification systems using hierarchically fine-tuned language models. The method can simultaneously adapt to multiple vulnerability classification standards (such as CVE, CWE, CVSS), improve classification accuracy through hierarchical fine-tuning strategies, and provide a more intelligent solution for cybersecurity vulnerability management.

漏洞分类网络安全语言模型CWECVE微调多任务学习威胁情报
Published 2026-04-14 14:59Recent activity 2026-04-14 15:02Estimated read 9 min
A Multi-Classification System Vulnerability Classification Method Based on Hierarchical Fine-Tuned Language Models
1

Section 01

[Introduction] A Multi-Classification System Vulnerability Classification Method Based on Hierarchical Fine-Tuned Language Models

This paper proposes a new method for vulnerability classification in multi-classification systems using hierarchically fine-tuned language models. It can simultaneously adapt to multiple vulnerability classification standards such as CVE, CWE, and CVSS. By using hierarchical fine-tuning strategies, it improves classification accuracy and addresses issues like heterogeneity of multi-classification systems, sparse labels, and complex hierarchical structures, providing a more intelligent solution for cybersecurity vulnerability management.

2

Section 02

Importance of Vulnerability Classification and Existing Challenges

Core Roles of Vulnerability Classification

  • Risk Assessment: Determine the severity and potential impact of vulnerabilities, prioritize handling high-risk ones
  • Threat Intelligence: Standardized classification promotes information sharing and collaborative defense among organizations
  • Automated Response: Accurate classification is a prerequisite for automated scanning and remediation
  • Compliance Reporting: Meet regulatory requirements and provide structured security posture reports

Complexity of Existing Classification Systems

The cybersecurity field has multiple parallel classification systems (CVE, CWE, CVSS, CAPEC, ATT&CK), each with different focuses and complex mapping relationships. Traditional manual classification is time-consuming and labor-intensive, making it difficult to ensure consistency and accuracy.

3

Section 03

Detailed Explanation of the Hierarchical Fine-Tuned Language Model Method

Method Overview

Use the hierarchical structure of classification systems to guide language model fine-tuning, learn hierarchical relationships between categories, and achieve accurate multi-system classification.

Hierarchical Pre-training

  • Hierarchy-Aware Masked Language Model: Predict masked words while predicting their hierarchical categories
  • Cross-System Alignment: Use contrastive learning to bring related categories from different systems closer in the representation space

Progressive Fine-Tuning Strategy

  • Top-Down Fine-Tuning: Propagate from top to bottom layers, mastering coarse-grained classification first
  • Cross-System Knowledge Distillation: Use teacher models from data-rich systems to guide learning in sparse systems
  • Constraint-Aware Loss Function: Introduce hierarchical constraints to ensure predictions comply with the system's hierarchical structure

Multi-Task Learning Architecture

  • Shared Encoder: Encode vulnerability description semantics based on CodeBERT/SecureBERT
  • Classification System-Specific Heads: Set up dedicated classification layers for each system
  • Cross-System Attention: Allow information interaction between different classification heads to improve performance using system correlations
4

Section 04

Experimental Evaluation Results and Analysis

Dataset and Setup

  • Datasets: NVD (200,000+ CVE records), CVEFixes (vulnerability descriptions + fix code), Devign (manually labeled tags)
  • Evaluation Metrics: Classification accuracy, hierarchical consistency, cross-system consistency

Key Results

  • Single-System Performance: 2-5% higher accuracy than standard fine-tuning
  • Multi-System Joint Classification: Maintain performance across systems while reducing parameters and inference time
  • Hierarchical Consistency: Almost fully compliant with constraints (15% violation in standard methods)
  • Data Efficiency: More obvious advantages under sparse data, learning effective information from limited annotations
5

Section 05

Application Value and Industry Impact of the Method

Automated Vulnerability Management

Integrate into scanning/management systems to automatically classify from multiple dimensions, provide structured results and risk assessments, and reduce manual review

Threat Intelligence Enhancement

Multi-classification labeling enables platforms to provide richer structured information, allowing analysts to track CWE weakness trends or CAPEC attack techniques

Security Training and Knowledge Management

Hierarchical classification results serve as training materials to help developers understand vulnerability characteristics; structured knowledge bases facilitate the accumulation and management of security knowledge

6

Section 06

Current Limitations and Future Research Directions

Current Limitations

  • Evolution of Classification Systems: Need regular retraining to adapt to updates
  • Multilingual Support: Mainly for English, limited support for other languages
  • Code Context: Insufficient utilization of code information

Future Directions

  • Continuous Learning: Adapt to new vulnerability patterns and system changes without full retraining
  • Multimodal Fusion: Combine text, code, exploit videos, and other multimodal information
  • Causal Reasoning: Understand the causal mechanisms of vulnerabilities and predict derivative vulnerabilities
  • Adversarial Robustness: Improve robustness against adversarial samples that manipulate vulnerability descriptions
7

Section 07

Conclusion: Future Outlook for Intelligent Vulnerability Management

The hierarchical fine-tuned language model method proposed in this study effectively addresses the problem of vulnerability classification in multi-classification systems, achieving significant progress in accuracy, consistency, and efficiency. As software complexity and security threats evolve, intelligent vulnerability management becomes increasingly important. We look forward to this research promoting the deep integration of cybersecurity and artificial intelligence, helping to build a safer digital world.