# MACyber: Multi-Source Aligned Benchmark and Domain-Specific Large Language Model for Cybersecurity

> The MACyber project provides the MACyber-INT multi-source aligned cybersecurity benchmark dataset and the MACyber-12B domain-specific large language model, covering seven key areas: network traffic, IoT, system logs, DNS, Web security, vulnerability intelligence, and threat intelligence. It offers a standardized toolset for evaluating AI models in the cybersecurity field.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-26T09:45:26.000Z
- 最近活动: 2026-05-26T09:49:35.710Z
- 热度: 159.9
- 关键词: 网络安全, 基准测试, 大语言模型, 威胁情报, RAG, AI安全, 漏洞检测, 入侵检测
- 页面链接: https://www.zingnex.cn/en/forum/thread/macyber
- Canonical: https://www.zingnex.cn/forum/thread/macyber
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: MACyber: Multi-Source Aligned Benchmark and Domain-Specific Large Language Model for Cybersecurity

The MACyber project provides the MACyber-INT multi-source aligned cybersecurity benchmark dataset and the MACyber-12B domain-specific large language model, covering seven key areas: network traffic, IoT, system logs, DNS, Web security, vulnerability intelligence, and threat intelligence. It offers a standardized toolset for evaluating AI models in the cybersecurity field.

## Original Author and Source

- **Original Author/Maintainer:** qcydm
- **Source Platform:** GitHub
- **Original Title:** MACyber: Multi-Source Aligned Cybersecurity Benchmark (MACyber-INT) and Large Language Model (MACyber-12B)
- **Original Link:** https://github.com/qcydm/MACyber
- **Publication Date:** May 26, 2026

## Project Overview

MACyber is a comprehensive open-source project focused on the cybersecurity domain, consisting of two core components: the MACyber-INT benchmark dataset and the MACyber-12B large language model. The project aims to address the lack of standardized evaluation tools for AI models in cybersecurity, providing researchers and practitioners with a structured framework for evaluating security intelligence data.

In today's digital age, cybersecurity threats are becoming increasingly complex, and traditional rule-based security systems struggle to handle new attack methods. Large language models have great potential for applications in cybersecurity, but there is a lack of targeted benchmarks to assess their real capabilities. The MACyber project fills this gap by constructing a comprehensive evaluation system covering seven key security areas through multi-source data alignment.

## Benchmark Architecture

The MACyber-INT benchmark dataset includes 31 datasets, organized into seven high-level security domains:

## Seven Key Security Domains

1. **Network Traffic Security**
   Covers threat detection at the network communication level, including scenarios like abnormal traffic identification and intrusion detection.

2. **IoT Security**
   Addresses the specific security needs of IoT devices and evaluates models' capabilities in IoT threat identification.

3. **System Log Security**
   Discovers potential security incidents and abnormal behaviors through system log analysis.

4. **DNS Security Threat**
   Focuses on attack detection at the DNS level, including DNS tunneling and DDoS attacks.

5. **Web Security Threat**
   Covers various attacks at the Web application level, such as SQL injection, XSS, CSRF, etc.

6. **Vulnerability Intelligence**
   Evaluates models' understanding of known vulnerabilities and their ability to identify new vulnerabilities.

7. **Threat Intelligence**
   Comprehensive threat information analysis, including attacker profiling and attack method identification.

## Data Schema Design

MACyber uses a structured JSON data schema, where each sample includes the following key fields:

- **Metadata (meta):** Contains category and subcategory information for data classification and retrieval
- **Feature Data (json):** Stores specific security features, such as network traffic features and log fields
- **Label Information (label):** Includes official threat labels and severity levels (Benign/Suspicious/Low/Medium/High)
- **Reasoning Process (reasoning):** Provides evidence chains and analysis logic, which is a key feature of MACyber
- **Response Recommendations (response):** Includes suggested disposal actions (No Action/Monitor/Block) and their justifications

This design not only provides a standard input-output format but also includes an interpretable reasoning process, making model evaluation focus not only on result accuracy but also on the rationality of reasoning logic.

## MACyber-12B Model

The project also provides the MACyber-12B large language model, which is specifically trained for the cybersecurity domain. This model includes two important components:

## CyberLoRA

A low-rank adapter optimized for cybersecurity tasks. By injecting cybersecurity domain expertise into the base large model, it enhances the model's performance on security-related tasks.
