Zing Forum

Reading

NeuroSploit v3.3.0: Reconstructing AI-Driven Penetration Testing with 213 Markdown Agents

NeuroSploit v3.3.0 is an autonomous penetration testing framework based on large language models. It achieves a paradigm shift from a Python monolithic architecture to a modular agent system through 213 Markdown-formatted professional agents, a reinforcement learning-driven agent selection mechanism, and Playwright MCP browser validation.

网络安全渗透测试大语言模型AI安全智能体自动化测试强化学习OWASP漏洞扫描LLM安全
Published 2026-06-15 08:41Recent activity 2026-06-15 08:54Estimated read 9 min
NeuroSploit v3.3.0: Reconstructing AI-Driven Penetration Testing with 213 Markdown Agents
1

Section 01

NeuroSploit v3.3.0: AI-Driven Penetration Testing with 213 Markdown Agents (Main Guide)

NeuroSploit v3.3.0: AI-Driven Penetration Testing

Core Highlights:

  • A paradigm shift from Python monolith to modular agent system
  • 213 Markdown-formatted professional agents
  • Reinforcement learning (RL)-driven agent selection mechanism
  • Playwright MCP browser validation for exploit verification

This framework leverages large language models (LLMs) to enable autonomous penetration testing, addressing key pain points in traditional testing workflows.

2

Section 02

Background: Penetration Testing Automation Challenges

Background: Penetration Testing Automation Challenges

The network security field faces a contradiction: enterprises need continuous penetration testing, but qualified testers are scarce and expensive. Traditional manual methods can't keep up with rapid app iterations.

Existing automation tools have limitations:

  1. Rigid rules: Signature-based scanners miss logical vulnerabilities and new attack vectors.
  2. False positives: Massive false reports waste security teams' time.
  3. Context loss: Lack of deep understanding of target architecture/business logic.
  4. Validation difficulty: Hard to auto-verify exploitability and impact.

LLMs' rapid advancement has opened new possibilities for integrating AI into penetration testing—NeuroSploit is a key exploration in this direction.

3

Section 03

Architecture Revolution: From Python Monolith to Markdown Agents

Architecture Revolution: From Python Monolith to Markdown Agents

Old Architecture (≤v3.2.4)

  • 2500 lines of Python orchestration code
  • Embedded LLM loops
  • Static agent lists

New Architecture (v3.3.0)

  • Markdown agents + thin engine
  • RL-weighted agent selection
  • Playwright MCP execution validation + adversarial verification
  • Pluggable backends (Claude Code/Codex/Grok)

The core insight: Separate agents' 'brains' from the framework, letting advanced AI systems handle reasoning while the engine focuses on orchestration, validation, and learning.

4

Section 04

213 Markdown Agents: Knowledge as Code

213 Markdown Agents: Knowledge as Code

Agent Classification

  • 196 Vulnerability Expert Agents: Covering OWASP Web Top10 (SQLi, XSS, CSRF), OWASP LLM Top10 (prompt injection, jailbreaking), cloud/K8s security (IMDS SSRF, bucket takeover), API/auth security (JWT issues, OAuth PKCE downgrade), advanced injections (SSTI, XXE), protocol attacks (HTTP desync, request smuggling), and logic/encryption/supply chain attacks (dependency confusion, weak JWT keys).
  • 17 Meta Agents: Orchestrator, Recon, Exploit Validator, False Positive Filter, Severity Assessor, RL Feedback, etc.

Custom Agents

Add new agents easily: Place a Markdown file in agents_md/vulns/ or use scripts/build_agents.py for batch generation.

5

Section 05

Workflow & Strict Validation: No Fabricated Findings

Workflow & Strict Validation: No Fabricated Findings

Execution Flow

URL → Orchestrator (load 213 agents + apply RL weights) → Backend (Claude/Codex/Grok) → Recon → Select Agents → Exploit → Validate → Filter FPs → Severity/Impact → Report → RL Feedback

Key Validation Rules

  1. Independent reuse: Meta Exploit Validator re-verifies each candidate vulnerability.
  2. Adversarial review: Meta False Positive Filter runs skeptical checks.
  3. Only verified findings: Only passed results are scored and reported.

This mechanism solves LLM's 'hallucination' problem in security testing.

6

Section 06

Reinforcement Learning & Backend Support

Reinforcement Learning & Backend Support

RL Mechanism

  • Rewards: Positive for verified findings (severity-weighted), negative for false positives, neutral for correct skips.
  • Tech stack affinity: Learns to prioritize agents for specific tech stacks (e.g., Flask → ssti_jinja2).
  • Explainable state: RL state stored in data/rl_state.json (weight range: [0.05,1.0]).

Supported Backends

  • Claude Code (requires Claude login)
  • Codex CLI
  • Grok CLI

Model Providers

NVIDIA NIM, Anthropic Claude4.x, OpenAI GPT, xAI Grok, Google Gemini, OpenRouter, local Ollama.

7

Section 07

Usage & Ethical Guidelines

Usage & Ethical Guidelines

Usage Commands

  • Check backends: ./neurosploit backends
  • List agents: ./neurosploit agents
  • Interactive mode: ./neurosploit
  • One-click run: ./neurosploit run https://target.example --backend claude --model claude-opus-4-8 --collaborator oob.your-collab.net
  • Preview mode: ./neurosploit run https://target.example --dry-run

Output Locations

  • Findings: results/<target>/findings.json
  • Reports: reports/
  • RL state: data/rl_state.json

Ethical Rules

  • Only test authorized targets.
  • No DoS attacks unless allowed by rules of engagement.
  • Provide exploitability proof for each finding.
8

Section 08

Limitations & Conclusion

Limitations & Conclusion

Limitations

  1. Cost: API fees for Claude Code/Codex may be significant.
  2. Time: Autonomous testing is slower than traditional scanners.
  3. False positives: Still possible despite filters.
  4. Coverage: 196 agents don't cover all vulnerabilities.
  5. Legal risk: Unauthorized testing violates laws.

Conclusion

NeuroSploit v3.3.0 marks a new era in AI-driven security testing. It scales expert knowledge, learns continuously, and ensures reliable findings. However, AI is an enhancement—human expertise, creativity, and ethical judgment remain irreplaceable in network security.