Zing Forum

Reading

GAMMAF: A Graph Anomaly Detection Evaluation Framework for LLM Multi-Agent Systems

GAMMAF is an open-source evaluation framework that focuses on generating synthetic interaction datasets for large language model (LLM)-based multi-agent systems and evaluating topology-guided defense methods against system integrity attacks.

LLMmulti-agentanomaly detectiongraphsecuritybenchmark
Published 2026-04-22 22:13Recent activity 2026-04-22 22:20Estimated read 7 min
GAMMAF: A Graph Anomaly Detection Evaluation Framework for LLM Multi-Agent Systems
1

Section 01

[Introduction] GAMMAF: A Graph Anomaly Detection Evaluation Framework for LLM Multi-Agent Systems

GAMMAF is an open-source evaluation framework developed by the UC3M (Universidad Carlos III de Madrid) team. It focuses on generating synthetic interaction datasets for LLM multi-agent systems and evaluating topology-guided defense methods against system integrity attacks. It fills the tool gap in the field of LLM-MAS security evaluation and provides a standardized experimental platform for research in this area.

2

Section 02

Project Background and Motivation

With the rapid development of LLM multi-agent systems (LLM-MAS), agent collaboration and communication have become key capabilities for complex tasks. However, the distributed architecture brings new security challenges: malicious agents may inject false information or manipulate communication to undermine system integrity. Traditional security evaluation methods struggle to capture the complex interaction patterns of multi-agent systems, so there is an urgent need for dedicated datasets and evaluation tools to test the effectiveness of defense mechanisms. As a comprehensive evaluation architecture, GAMMAF aims to generate synthetic interaction datasets and benchmark defense models.

3

Section 03

Core Dual-Pipeline Architecture Design

GAMMAF adopts a dual-pipeline architecture:

Training Data Generation Phase

It simulates debate scenarios under different network topologies, captures agent interaction behaviors, and represents them as attribute graphs (encoding both communication content and topology information). Users can configure debate topics, number of agents, and topology types (fully connected, ring, star, random networks, etc.) to generate customized training data.

Defense System Evaluation Phase

It dynamically evaluates defense models during real-time inference. When suspicious behaviors are detected, it actively isolates adversarial agent nodes and observes the collaboration effect of the remaining network, truly reflecting the actual deployment performance of the defense strategy.

4

Section 04

Technical Implementation and Extension Interfaces

GAMMAF is developed based on Python 3.11, uses conda for environment management, and is compatible with any inference service that conforms to the OpenAI API specification. For local deployment, vLLM is recommended as the backend. Core scripts include TrainDataGeneration.py (for training data generation) and MainEvaluation.py (for defense evaluation), both of which manage parameters via YAML configuration files. The framework provides extension interfaces to support adding new defense models, text processing logic, and task datasets. Its modular design facilitates integrating algorithms and comparing with existing benchmarks.

5

Section 05

Application Scenarios and Research Value

The main application scenarios of GAMMAF are:

  • Defense Mechanism Development: Use standardized datasets to test new anomaly detection algorithms and avoid repeated data collection;
  • Model Comparison Analysis: Use unified metrics and environments to fairly compare the advantages and disadvantages of different defense strategies;
  • Attack Pattern Research: Use the controllability of synthetic data to systematically study the impact of various attacks on multi-agent systems;
  • Teaching Demonstration: Provide an intuitive experimental platform for academic courses to help understand the vulnerability of distributed systems.
6

Section 06

Usage Guide and Future Outlook

Usage Guide

Environment setup: Create a conda environment and install dependencies, configure LLM backend parameters (BASE_URL, API_KEY, MODEL_NAME), and use predefined configuration templates to start the data generation and evaluation processes. For in-depth customization, refer to the documentation to modify the architecture to adapt to specific needs (add custom defense models, adjust text processing logic, introduce new task datasets).

Future Outlook

As an open-source project, it welcomes community contributions. Future plans include expanding more network topology types, supporting more complex attack scenarios, integrating visualization tools to display evaluation results, and promoting the development of LLM-MAS security research.