Zing Forum

Reading

MACyber: Multi-source Alignment Benchmark and 12B Large Model for Cybersecurity Domain

The MACyber project provides a comprehensive benchmark dataset covering seven major security domains, an evaluation toolchain, and a supporting 12B-parameter threat intelligence-enhanced large model, establishing a standardized framework for cybersecurity AI capability evaluation.

网络安全基准测试大语言模型威胁情报RAGMACyber安全评估AI安全
Published 2026-05-21 17:43Recent activity 2026-05-21 17:48Estimated read 5 min
MACyber: Multi-source Alignment Benchmark and 12B Large Model for Cybersecurity Domain
1

Section 01

MACyber Project Introduction: A New Standardized Paradigm for Cybersecurity AI Evaluation

Addressing the challenges in cybersecurity AI capability evaluation, the MACyber project has built a comprehensive benchmark system covering seven major security domains, developed a supporting 12B-parameter threat intelligence-enhanced large model, and provided a standardized evaluation toolchain, establishing a new paradigm for cybersecurity AI capability evaluation. It consists of two core dual-drive components: MACyber-INT (multi-source alignment benchmark dataset) and MACyber-12B (dedicated large model).

2

Section 02

Project Background: Pain Points and Positioning of Cybersecurity AI Evaluation

Cybersecurity data is highly heterogeneous and professional, and existing general benchmarks (such as MMLU) lack in-depth coverage of the security domain. The MACyber team proposed the concept of "multi-source alignment", integrating multi-scenario data through a unified framework. Open-sourced by the qcydm team, it is positioned as a standardized evaluation system driven by both "benchmark + model".

3

Section 03

Technical Approach: Data Schema, Model Architecture, and Evaluation System

  1. Unified Data Schema: Includes five components: metadata, feature data, label information, reasoning process (evidence chain + analysis logic), and response suggestions;
  2. MACyber-12B Model: Built-in RAG dual-channel architecture (exact matching for known attacks / similarity reference for unknown attacks);
  3. Evaluation System: Four-dimensional weighted scoring (reasoning: 40%, threat classification: 30%, disposal suggestions: 20%, severity level:10%), using Qwen3-Max as the judge, supporting automated batch evaluation.
4

Section 04

Domain Coverage: Panoramic Evidence of Seven Major Security Domains

Covers 31 datasets, including seven core domains:

  • Network traffic security: Identify anomalies such as DDoS and port scanning;
  • IoT security: Analyze device behavior patterns and anomalies;
  • System log security: Detect events like privilege escalation and abnormal login;
  • DNS security: Identify abuses such as tunneling and DGA;
  • Web security: Covers OWASP Top10 attacks;
  • Vulnerability intelligence: CVE description and risk assessment;
  • Threat intelligence: Comprehensive analysis of multi-source information.
5

Section 05

Application Value: From Vendor Selection to Practical Deployment

  • Security vendors: Objectively evaluate model capabilities to assist product selection;
  • Researchers: Fill the gap of standardized benchmarks in the security domain and support experimental comparison;
  • Practical scenarios: The model can be directly used for SOC intelligent decision-making, threat intelligence analysis, and audit report generation; the dataset can be used for fine-tuning security models.
6

Section 06

Open Source Ecosystem and Future Outlook

The project is fully open-sourced (GitHub), providing data conversion tools and Schema validation mechanisms. Future plans include expanding the dataset to emerging domains such as cloud security and supply chain security, exploring larger-parameter dedicated security models, with the goal of becoming the de facto standard for AI evaluation in the security domain.