Zing Forum

Reading

GemmaShield: A Localized AI Security Red Team Testing Platform Based on Gemma 4

GemmaShield is an open-source AI security testing platform that simulates adversarial attacks through four autonomous agents (attacker, target, defender, judge). It runs entirely on the local Gemma 4 model without needing cloud APIs, providing comprehensive security assessments for AI systems before deployment.

GemmaShieldGemma 4AI安全红队测试Ollama本地推理OWASP提示词注入对抗性攻击安全评估
Published 2026-05-18 18:12Recent activity 2026-05-18 18:50Estimated read 5 min
GemmaShield: A Localized AI Security Red Team Testing Platform Based on Gemma 4
1

Section 01

GemmaShield Guide: Core Introduction to the Localized AI Security Red Team Testing Platform

GemmaShield is an open-source AI security testing platform that simulates adversarial attacks via four autonomous agents (attacker, target, defender, judge). It runs on the local Gemma4 model (no cloud API required) to provide comprehensive security assessments for AI systems before deployment, addressing the pain points of existing solutions such as data privacy risks or lack of standard frameworks.

2

Section 02

Urgent Need for AI Security Testing and Current Challenges

With the application of large language models in sensitive fields like healthcare and finance, there is a lack of systematic adversarial testing before launch, exposing them to threats such as prompt injection and jailbreaking. Existing solutions relying on cloud APIs have privacy risks or no standardized assessment frameworks, and GemmaShield addresses these pain points specifically.

3

Section 03

GemmaShield Core Architecture: Four-Agent Collaborative Workflow

The core innovation lies in the collaboration of four agents (all driven by Gemma4 and running locally via Ollama): the attacker generates targeted adversarial attacks; the target simulates responses from real AI systems; the defender judges whether the attack is successful and classifies/scores it; the judge provides final CVSS scores, vulnerability classifications, and repair recommendations. The system uses React for the frontend + FastAPI for the backend, with SQLite and JSONL storing audit logs.

4

Section 04

Localized Privacy Protection and Alignment with OWASP Standards

100% local inference: all agents call the local Gemma4 via Ollama, so sensitive data never leaves the local environment. Attacks are automatically mapped to the OWASP LLM Top10 classifications (e.g., prompt injection corresponds to LLM01, jailbreaking to LLM02, etc.), and results comply with industry standards.

5

Section 05

Real-Scenario Simulation and Feature Highlights

Built-in six real scenarios including healthcare, banking, and law (each scenario has corresponding system prompts and compliance requirements); the attacker agent generates structured attacks (including type, prompt, method, etc.); provides a real-time visual battle console (showing execution status, OWASP classification, debugging information); generates a structured security report for each battle (PDF downloadable, including summary, vulnerability classification, repair recommendations, etc.).

6

Section 06

Tech Stack and Deployment Steps

Backend: Python3.10 + FastAPI; Frontend: React18 + Server-Sent Events; PDF reports generated client-side. Deployment requires an Ollama environment, steps: pull gemma4:latest, start the backend (uvicorn) and frontend (npm start).

7

Section 07

Open-Source Significance and Industry Impact

As an open-source project, it provides a reproducible and auditable benchmark solution, proving that local open-source models can perform complex security assessments. It offers low-threshold pre-deployment tools for organizations and an experimental platform for researchers, promoting the standardization and democratization of AI security testing.

8

Section 08

Conclusion: AI Security Testing Should Become a Standard Pre-Deployment Process

With the popularization of AI, security testing needs to be prioritized. GemmaShield provides a feasible tool with localized, standardized, and automated features. We look forward to the project's development and community contributions to promote the maturity and popularization of AI security testing.