Reading

GAMMAF: A Graph Anomaly Detection Evaluation Framework for LLM Multi-Agent Systems

GAMMAF is an open-source evaluation framework that focuses on generating synthetic interaction datasets for large language model (LLM)-based multi-agent systems and evaluating topology-guided defense methods against system integrity attacks.

LLMmulti-agentanomaly detectiongraphsecuritybenchmark

Published 2026-04-22 22:13Recent activity 2026-04-22 22:20Estimated read 7 min

Section 01

[Introduction] GAMMAF: A Graph Anomaly Detection Evaluation Framework for LLM Multi-Agent Systems

GAMMAF is an open-source evaluation framework developed by the UC3M (Universidad Carlos III de Madrid) team. It focuses on generating synthetic interaction datasets for LLM multi-agent systems and evaluating topology-guided defense methods against system integrity attacks. It fills the tool gap in the field of LLM-MAS security evaluation and provides a standardized experimental platform for research in this area.

Section 02

Project Background and Motivation

With the rapid development of LLM multi-agent systems (LLM-MAS), agent collaboration and communication have become key capabilities for complex tasks. However, the distributed architecture brings new security challenges: malicious agents may inject false information or manipulate communication to undermine system integrity. Traditional security evaluation methods struggle to capture the complex interaction patterns of multi-agent systems, so there is an urgent need for dedicated datasets and evaluation tools to test the effectiveness of defense mechanisms. As a comprehensive evaluation architecture, GAMMAF aims to generate synthetic interaction datasets and benchmark defense models.

Section 03

Core Dual-Pipeline Architecture Design

GAMMAF adopts a dual-pipeline architecture:

Training Data Generation Phase

It simulates debate scenarios under different network topologies, captures agent interaction behaviors, and represents them as attribute graphs (encoding both communication content and topology information). Users can configure debate topics, number of agents, and topology types (fully connected, ring, star, random networks, etc.) to generate customized training data.

Defense System Evaluation Phase

It dynamically evaluates defense models during real-time inference. When suspicious behaviors are detected, it actively isolates adversarial agent nodes and observes the collaboration effect of the remaining network, truly reflecting the actual deployment performance of the defense strategy.

Section 04

Technical Implementation and Extension Interfaces

GAMMAF is developed based on Python 3.11, uses conda for environment management, and is compatible with any inference service that conforms to the OpenAI API specification. For local deployment, vLLM is recommended as the backend. Core scripts include TrainDataGeneration.py (for training data generation) and MainEvaluation.py (for defense evaluation), both of which manage parameters via YAML configuration files. The framework provides extension interfaces to support adding new defense models, text processing logic, and task datasets. Its modular design facilitates integrating algorithms and comparing with existing benchmarks.

Section 05

Application Scenarios and Research Value

The main application scenarios of GAMMAF are:

Defense Mechanism Development: Use standardized datasets to test new anomaly detection algorithms and avoid repeated data collection;
Model Comparison Analysis: Use unified metrics and environments to fairly compare the advantages and disadvantages of different defense strategies;
Attack Pattern Research: Use the controllability of synthetic data to systematically study the impact of various attacks on multi-agent systems;
Teaching Demonstration: Provide an intuitive experimental platform for academic courses to help understand the vulnerability of distributed systems.

Section 06

Usage Guide and Future Outlook

Usage Guide

Environment setup: Create a conda environment and install dependencies, configure LLM backend parameters (BASE_URL, API_KEY, MODEL_NAME), and use predefined configuration templates to start the data generation and evaluation processes. For in-depth customization, refer to the documentation to modify the architecture to adapt to specific needs (add custom defense models, adjust text processing logic, introduce new task datasets).

Future Outlook

As an open-source project, it welcomes community contributions. Future plans include expanding more network topology types, supporting more complex attack scenarios, integrating visualization tools to display evaluation results, and promoting the development of LLM-MAS security research.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49