Reading

MACyber: Multi-Source Aligned Benchmark and Domain-Specific Large Language Model for Cybersecurity

The MACyber project provides the MACyber-INT multi-source aligned cybersecurity benchmark dataset and the MACyber-12B domain-specific large language model, covering seven key areas: network traffic, IoT, system logs, DNS, Web security, vulnerability intelligence, and threat intelligence. It offers a standardized toolset for evaluating AI models in the cybersecurity field.

网络安全基准测试大语言模型威胁情报RAGAI安全漏洞检测入侵检测

Published 2026-05-26 17:45Recent activity 2026-05-26 17:49Estimated read 7 min

Section 01

Introduction / Main Floor: MACyber: Multi-Source Aligned Benchmark and Domain-Specific Large Language Model for Cybersecurity

Section 02

Original Author and Source

Original Author/Maintainer: qcydm
Source Platform: GitHub
Original Title: MACyber: Multi-Source Aligned Cybersecurity Benchmark (MACyber-INT) and Large Language Model (MACyber-12B)
Original Link: https://github.com/qcydm/MACyber
Publication Date: May 26, 2026

Section 03

Project Overview

MACyber is a comprehensive open-source project focused on the cybersecurity domain, consisting of two core components: the MACyber-INT benchmark dataset and the MACyber-12B large language model. The project aims to address the lack of standardized evaluation tools for AI models in cybersecurity, providing researchers and practitioners with a structured framework for evaluating security intelligence data.

In today's digital age, cybersecurity threats are becoming increasingly complex, and traditional rule-based security systems struggle to handle new attack methods. Large language models have great potential for applications in cybersecurity, but there is a lack of targeted benchmarks to assess their real capabilities. The MACyber project fills this gap by constructing a comprehensive evaluation system covering seven key security areas through multi-source data alignment.

Section 04

Benchmark Architecture

The MACyber-INT benchmark dataset includes 31 datasets, organized into seven high-level security domains:

Section 05

Seven Key Security Domains

Network Traffic Security Covers threat detection at the network communication level, including scenarios like abnormal traffic identification and intrusion detection.
IoT Security Addresses the specific security needs of IoT devices and evaluates models' capabilities in IoT threat identification.
System Log Security Discovers potential security incidents and abnormal behaviors through system log analysis.
DNS Security Threat Focuses on attack detection at the DNS level, including DNS tunneling and DDoS attacks.
Web Security Threat Covers various attacks at the Web application level, such as SQL injection, XSS, CSRF, etc.
Vulnerability Intelligence Evaluates models' understanding of known vulnerabilities and their ability to identify new vulnerabilities.
Threat Intelligence Comprehensive threat information analysis, including attacker profiling and attack method identification.

Section 06

Data Schema Design

MACyber uses a structured JSON data schema, where each sample includes the following key fields:

Metadata (meta): Contains category and subcategory information for data classification and retrieval
Feature Data (json): Stores specific security features, such as network traffic features and log fields
Label Information (label): Includes official threat labels and severity levels (Benign/Suspicious/Low/Medium/High)
Reasoning Process (reasoning): Provides evidence chains and analysis logic, which is a key feature of MACyber
Response Recommendations (response): Includes suggested disposal actions (No Action/Monitor/Block) and their justifications

This design not only provides a standard input-output format but also includes an interpretable reasoning process, making model evaluation focus not only on result accuracy but also on the rationality of reasoning logic.

Section 07

MACyber-12B Model

The project also provides the MACyber-12B large language model, which is specifically trained for the cybersecurity domain. This model includes two important components:

Section 08

CyberLoRA

A low-rank adapter optimized for cybersecurity tasks. By injecting cybersecurity domain expertise into the base large model, it enhances the model's performance on security-related tasks.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15