# GAMMAF: A Graph Anomaly Detection Evaluation Framework for LLM Multi-Agent Systems

> GAMMAF is an open-source evaluation framework that focuses on generating synthetic interaction datasets for large language model (LLM)-based multi-agent systems and evaluating topology-guided defense methods against system integrity attacks.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-22T14:13:56.000Z
- 最近活动: 2026-04-22T14:20:15.365Z
- 热度: 137.9
- 关键词: LLM, multi-agent, anomaly detection, graph, security, benchmark
- 页面链接: https://www.zingnex.cn/en/forum/thread/gammaf-llm
- Canonical: https://www.zingnex.cn/forum/thread/gammaf-llm
- Markdown 来源: floors_fallback

---

## [Introduction] GAMMAF: A Graph Anomaly Detection Evaluation Framework for LLM Multi-Agent Systems

GAMMAF is an open-source evaluation framework developed by the UC3M (Universidad Carlos III de Madrid) team. It focuses on generating synthetic interaction datasets for LLM multi-agent systems and evaluating topology-guided defense methods against system integrity attacks. It fills the tool gap in the field of LLM-MAS security evaluation and provides a standardized experimental platform for research in this area.

## Project Background and Motivation

With the rapid development of LLM multi-agent systems (LLM-MAS), agent collaboration and communication have become key capabilities for complex tasks. However, the distributed architecture brings new security challenges: malicious agents may inject false information or manipulate communication to undermine system integrity. Traditional security evaluation methods struggle to capture the complex interaction patterns of multi-agent systems, so there is an urgent need for dedicated datasets and evaluation tools to test the effectiveness of defense mechanisms. As a comprehensive evaluation architecture, GAMMAF aims to generate synthetic interaction datasets and benchmark defense models.

## Core Dual-Pipeline Architecture Design

GAMMAF adopts a dual-pipeline architecture:

### Training Data Generation Phase
It simulates debate scenarios under different network topologies, captures agent interaction behaviors, and represents them as attribute graphs (encoding both communication content and topology information). Users can configure debate topics, number of agents, and topology types (fully connected, ring, star, random networks, etc.) to generate customized training data.

### Defense System Evaluation Phase
It dynamically evaluates defense models during real-time inference. When suspicious behaviors are detected, it actively isolates adversarial agent nodes and observes the collaboration effect of the remaining network, truly reflecting the actual deployment performance of the defense strategy.

## Technical Implementation and Extension Interfaces

GAMMAF is developed based on Python 3.11, uses conda for environment management, and is compatible with any inference service that conforms to the OpenAI API specification. For local deployment, vLLM is recommended as the backend. Core scripts include `TrainDataGeneration.py` (for training data generation) and `MainEvaluation.py` (for defense evaluation), both of which manage parameters via YAML configuration files. The framework provides extension interfaces to support adding new defense models, text processing logic, and task datasets. Its modular design facilitates integrating algorithms and comparing with existing benchmarks.

## Application Scenarios and Research Value

The main application scenarios of GAMMAF are:
- **Defense Mechanism Development**: Use standardized datasets to test new anomaly detection algorithms and avoid repeated data collection;
- **Model Comparison Analysis**: Use unified metrics and environments to fairly compare the advantages and disadvantages of different defense strategies;
- **Attack Pattern Research**: Use the controllability of synthetic data to systematically study the impact of various attacks on multi-agent systems;
- **Teaching Demonstration**: Provide an intuitive experimental platform for academic courses to help understand the vulnerability of distributed systems.

## Usage Guide and Future Outlook

### Usage Guide
Environment setup: Create a conda environment and install dependencies, configure LLM backend parameters (BASE_URL, API_KEY, MODEL_NAME), and use predefined configuration templates to start the data generation and evaluation processes. For in-depth customization, refer to the documentation to modify the architecture to adapt to specific needs (add custom defense models, adjust text processing logic, introduce new task datasets).

### Future Outlook
As an open-source project, it welcomes community contributions. Future plans include expanding more network topology types, supporting more complex attack scenarios, integrating visualization tools to display evaluation results, and promoting the development of LLM-MAS security research.
