# NeuroSploit v3.3.0: Reconstructing AI-Driven Penetration Testing with 213 Markdown Agents

> NeuroSploit v3.3.0 is an autonomous penetration testing framework based on large language models. It achieves a paradigm shift from a Python monolithic architecture to a modular agent system through 213 Markdown-formatted professional agents, a reinforcement learning-driven agent selection mechanism, and Playwright MCP browser validation.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-15T00:41:28.000Z
- 最近活动: 2026-06-15T00:54:37.752Z
- 热度: 163.8
- 关键词: 网络安全, 渗透测试, 大语言模型, AI安全, 智能体, 自动化测试, 强化学习, OWASP, 漏洞扫描, LLM安全
- 页面链接: https://www.zingnex.cn/en/forum/thread/neurosploit-v3-3-0-213markdownai
- Canonical: https://www.zingnex.cn/forum/thread/neurosploit-v3-3-0-213markdownai
- Markdown 来源: floors_fallback

---

## NeuroSploit v3.3.0: AI-Driven Penetration Testing with 213 Markdown Agents (Main Guide)

## NeuroSploit v3.3.0: AI-Driven Penetration Testing

**Core Highlights**: 
- A paradigm shift from Python monolith to modular agent system
- 213 Markdown-formatted professional agents
- Reinforcement learning (RL)-driven agent selection mechanism
- Playwright MCP browser validation for exploit verification

This framework leverages large language models (LLMs) to enable autonomous penetration testing, addressing key pain points in traditional testing workflows.

## Background: Penetration Testing Automation Challenges

## Background: Penetration Testing Automation Challenges

The network security field faces a contradiction: enterprises need continuous penetration testing, but qualified testers are scarce and expensive. Traditional manual methods can't keep up with rapid app iterations.

Existing automation tools have limitations:
1. **Rigid rules**: Signature-based scanners miss logical vulnerabilities and new attack vectors.
2. **False positives**: Massive false reports waste security teams' time.
3. **Context loss**: Lack of deep understanding of target architecture/business logic.
4. **Validation difficulty**: Hard to auto-verify exploitability and impact.

LLMs' rapid advancement has opened new possibilities for integrating AI into penetration testing—NeuroSploit is a key exploration in this direction.

## Architecture Revolution: From Python Monolith to Markdown Agents

## Architecture Revolution: From Python Monolith to Markdown Agents

### Old Architecture (≤v3.2.4)
- 2500 lines of Python orchestration code
- Embedded LLM loops
- Static agent lists

### New Architecture (v3.3.0)
- Markdown agents + thin engine
- RL-weighted agent selection
- Playwright MCP execution validation + adversarial verification
- Pluggable backends (Claude Code/Codex/Grok)

The core insight: Separate agents' 'brains' from the framework, letting advanced AI systems handle reasoning while the engine focuses on orchestration, validation, and learning.

## 213 Markdown Agents: Knowledge as Code

## 213 Markdown Agents: Knowledge as Code

### Agent Classification
- **196 Vulnerability Expert Agents**: Covering OWASP Web Top10 (SQLi, XSS, CSRF), OWASP LLM Top10 (prompt injection, jailbreaking), cloud/K8s security (IMDS SSRF, bucket takeover), API/auth security (JWT issues, OAuth PKCE downgrade), advanced injections (SSTI, XXE), protocol attacks (HTTP desync, request smuggling), and logic/encryption/supply chain attacks (dependency confusion, weak JWT keys).
- **17 Meta Agents**: Orchestrator, Recon, Exploit Validator, False Positive Filter, Severity Assessor, RL Feedback, etc.

### Custom Agents
Add new agents easily: Place a Markdown file in `agents_md/vulns/` or use `scripts/build_agents.py` for batch generation.

## Workflow & Strict Validation: No Fabricated Findings

## Workflow & Strict Validation: No Fabricated Findings

### Execution Flow
URL → Orchestrator (load 213 agents + apply RL weights) → Backend (Claude/Codex/Grok) → Recon → Select Agents → Exploit → Validate → Filter FPs → Severity/Impact → Report → RL Feedback

### Key Validation Rules
1. **Independent reuse**: Meta Exploit Validator re-verifies each candidate vulnerability.
2. **Adversarial review**: Meta False Positive Filter runs skeptical checks.
3. **Only verified findings**: Only passed results are scored and reported.

This mechanism solves LLM's 'hallucination' problem in security testing.

## Reinforcement Learning & Backend Support

## Reinforcement Learning & Backend Support

### RL Mechanism
- **Rewards**: Positive for verified findings (severity-weighted), negative for false positives, neutral for correct skips.
- **Tech stack affinity**: Learns to prioritize agents for specific tech stacks (e.g., Flask → ssti_jinja2).
- **Explainable state**: RL state stored in `data/rl_state.json` (weight range: [0.05,1.0]).

### Supported Backends
- Claude Code (requires Claude login)
- Codex CLI
- Grok CLI

### Model Providers
NVIDIA NIM, Anthropic Claude4.x, OpenAI GPT, xAI Grok, Google Gemini, OpenRouter, local Ollama.

## Usage & Ethical Guidelines

## Usage & Ethical Guidelines

### Usage Commands
- Check backends: `./neurosploit backends`
- List agents: `./neurosploit agents`
- Interactive mode: `./neurosploit`
- One-click run: `./neurosploit run https://target.example --backend claude --model claude-opus-4-8 --collaborator oob.your-collab.net`
- Preview mode: `./neurosploit run https://target.example --dry-run`

### Output Locations
- Findings: `results/<target>/findings.json`
- Reports: `reports/`
- RL state: `data/rl_state.json`

### Ethical Rules
- Only test authorized targets.
- No DoS attacks unless allowed by rules of engagement.
- Provide exploitability proof for each finding.

## Limitations & Conclusion

## Limitations & Conclusion

### Limitations
1. **Cost**: API fees for Claude Code/Codex may be significant.
2. **Time**: Autonomous testing is slower than traditional scanners.
3. **False positives**: Still possible despite filters.
4. **Coverage**: 196 agents don't cover all vulnerabilities.
5. **Legal risk**: Unauthorized testing violates laws.

### Conclusion
NeuroSploit v3.3.0 marks a new era in AI-driven security testing. It scales expert knowledge, learns continuously, and ensures reliable findings. However, AI is an enhancement—human expertise, creativity, and ethical judgment remain irreplaceable in network security.
