Zing Forum

Reading

RuleForge: How AWS Uses LLM to Automate Vulnerability Detection Rule Generation and Reduce False Positives by 67%

AWS's internal system RuleForge leverages the LLM-as-a-Judge validation mechanism and 5x5 generation strategy to automatically generate JSON detection rules from Nuclei templates. It reduces false positive rates by 67% while maintaining high detection rates.

漏洞检测LLMAWSRuleForge自动化安全CVENuclei误报率LLM-as-a-Judge
Published 2026-04-02 20:39Recent activity 2026-04-03 09:18Estimated read 7 min
RuleForge: How AWS Uses LLM to Automate Vulnerability Detection Rule Generation and Reduce False Positives by 67%
1

Section 01

RuleForge Overview: AWS Uses LLM to Automate Vulnerability Detection Rule Generation, Reducing False Positives by 67%

Key Takeaways of RuleForge

AWS's internal system RuleForge uses the LLM-as-a-Judge validation mechanism and 5x5 generation strategy to automatically generate JSON vulnerability detection rules from Nuclei templates. While maintaining high detection rates, the system reduces false positive rates by 67%, effectively addressing the large-scale challenge where vulnerability detection rule development cannot keep up with the speed of vulnerability disclosure.

2

Section 02

Background: The Large-Scale Dilemma of Vulnerability Detection

Background: The Large-Scale Dilemma of Vulnerability Detection

In 2025, the U.S. National Vulnerability Database (NVD) released over 48,000 new vulnerabilities. The speed at which security teams manually develop detection rules lags far behind the pace of vulnerability disclosure. The traditional manual mode relies on expert experience, is inefficient, and prone to omissions or errors due to fatigue. The industry urgently needs an automated, large-scale, high-quality rule generation solution.

3

Section 03

Methodology: RuleForge's Core Architecture and 5x5 Generation Strategy

Methodology: RuleForge's Core Architecture and 5x5 Generation Strategy

Core Architecture

RuleForge workflow: Input Nuclei template → Extract key vulnerability features → Generate candidate detection rules → Multi-dimensional quality validation → Output final JSON rules.

5x5 Generation Strategy

  • Generate 5 candidate rules in parallel to leverage LLM-generated diversity;
  • Each candidate rule undergoes up to 5 rounds of iterative optimization to fix defects;
  • Validation results are fed back into the generation process to form a closed-loop improvement.
4

Section 04

Evidence: Effectiveness of the LLM-as-a-Judge Validation Mechanism

Evidence: Effectiveness of the LLM-as-a-Judge Validation Mechanism

RuleForge introduces LLM-as-a-Judge for dual-dimensional evaluation:

  • Sensitivity: Ensure capture of real attack traffic to avoid false negatives;
  • Specificity: Ensure normal traffic is not misjudged as attacks to avoid false positives.

This mechanism enables the system to achieve an AUROC of 0.75, reducing false positive rates by 67% compared to methods using only synthetic testing, allowing security teams to focus on real threats.

5

Section 05

Extension Capabilities and Practical Experience

Extension Capabilities and Practical Experience

Extension Capabilities

  • Explore rule generation from unstructured data sources (security announcements, vulnerability reports, etc.);
  • Validate multi-event type detection to identify complex attack chains and combined threats.

Practical Lessons

  • LLMs have overconfidence issues, requiring independent validation mechanisms;
  • Domain experts are indispensable in prompt design and result review;
  • Human-machine collaboration is the most effective model currently—LLMs are tools, not replacements.
6

Section 06

Technical Details: JSON Rules and Integrated Deployment

Technical Details: JSON Rules and Integrated Deployment

RuleForge's considerations for JSON format rules:

  • Parsability: Facilitates programmatic processing and integration;
  • Standardization: Unified structure for easy management and version control;
  • Performance: Optimized JSON parsing, suitable for high-throughput detection scenarios.

The system is deeply integrated with AWS's internal detection infrastructure, allowing generated rules to be directly deployed to production, shortening the time window from vulnerability disclosure to protection.

7

Section 07

Conclusions and Industry Implications

Conclusions and Industry Implications

RuleForge represents an important direction for security operations automation; the pure manual rule development model is no longer sustainable. The hybrid model of automated generation + intelligent validation may become mainstream.

Implications for security teams:

  1. Build an automated rule generation process suitable for your own environment;
  2. Design effective validation mechanisms to ensure rule quality;
  3. Balance the optimal point between automation and manual review.

LLMs have great potential in the cybersecurity field, but they need to be combined with careful system design, strict validation, and continuous iterative optimization.