Reading

PromeFuzz: A Knowledge-Driven Fuzz Testing Driver Generation Tool Based on Large Language Models

An ACM CCS 2025 research project that uses large language models to automatically generate fuzz testing drivers, improving software security testing efficiency

模糊测试大语言模型软件安全漏洞挖掘ACM CCS自动化测试Fuzzing

Published 2026-04-29 15:35Recent activity 2026-04-29 15:48Estimated read 6 min

PromeFuzz: A Knowledge-Driven Fuzz Testing Driver Generation Tool Based on Large Language Models

Section 01

[Introduction] PromeFuzz: Core Introduction to a Knowledge-Driven Fuzz Testing Tool Based on Large Language Models

PromeFuzz is a research project for ACM CCS 2025. Addressing the pain points of time-consuming driver writing and easy omission of key scenarios in fuzz testing, it leverages the knowledge capabilities of large language models (such as Claude, GPT-4) to automatically generate high-quality test drivers, significantly improving software security testing efficiency and lowering the technical threshold for high-quality fuzz testing.

Section 02

Project Background: Pain Points of Fuzz Testing Drivers and Core Ideas

Fuzz testing relies on high-quality drivers, but manual writing requires in-depth understanding of code interfaces, data formats, and constraints—it is time-consuming and prone to missing key scenarios. PromeFuzz introduces a knowledge-driven approach, enabling large language models to understand source code semantics (function signatures, types, control flow, etc.), and generate effective drivers based on their internalized programming and security knowledge, which is different from traditional template/rule-based methods.

Section 03

System Architecture: Automated Workflow from Source Code to Fuzz Testing

Source Code Preprocessing and Knowledge Extraction: Obtain compilation configurations via Clang/LLVM, parse function signatures, structs, etc., and build a project knowledge base;
LLM Driver Generation: Design prompts based on the principles of accuracy, security, and exploration; support models like Claude Sonnet/GPT-4o;
Driver Selection Optimization: Evaluate candidate drivers for parameter completeness, error handling, etc., and compile binaries with ASan and coverage tools;
Fuzz Testing Execution: Run drivers with libFuzzer, collect coverage, crash information, and corpus.

Section 04

Technical Highlights: Extended Support and Performance Optimization

Extended support for 27 open-source project configurations (network protocols, image processing, compression algorithms, etc.);
Built-in benchmark framework that supports JSONL manifests for defining test cases, facilitating tool comparison;
Python 3.11 optimization: fixed compatibility issues, accelerated critical paths with Numba JIT, resulting in ~30x speed improvement.

Section 05

Practical Application Scenarios: Multi-Domain Security Testing

Open-Source Auditing: Generate drivers for key libraries like curl and openssl to detect memory vulnerabilities;
CI/CD Integration: Automatically run fuzz tests after code changes to detect regression vulnerabilities early;
Security Research: Serve as a benchmark tool to evaluate the effectiveness of new fuzz testing algorithms.

Section 06

Installation and Usage Guide

Install system dependencies: clang, llvm-dev, libclang-dev, cmake, bear;
Create a Python virtual environment and install dependencies;
Run setup.sh to build the C++ preprocessor;
Configure Anthropic/OpenAI API keys;
Prepare JSONL test manifest and run setup_and_run_all.sh to start the workflow.

Section 07

Limitations and Future Directions

Limitations: Dependence on compilation databases (may be difficult for complex projects), LLM API costs, random fluctuations in generation quality; Future Directions: Support local open-source LLMs, introduce coverage feedback to optimize drivers, enhance support for complex state APIs.

Section 08

Conclusion: The Value of PromeFuzz

PromeFuzz combines large language models with fuzz testing technology. Automated driver generation lowers the testing threshold, improves coverage and vulnerability detection capabilities, making it suitable for development teams, security researchers, and open-source maintainers.

Continue Reading

Keep going with more reads from the same topic.

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

SignalCut is an innovative web application that analyzes brands' visibility gaps in AI search, automatically generates evidence-based marketing strategies, and creates Hera video materials, helping early-stage brands gain a competitive edge in the AI answer engine era.

Recent activity 2026-04-26 11:27

AWS Open-Sources AI Search Citation Analysis System: Track Brand Exposure in AI Search Engines

An open-source project officially released by AWS, built on Amazon Bedrock, Step Functions, and React to form a complete serverless citation analysis system. It helps enterprises monitor their brand's citation status and competitive landscape in AI searches like ChatGPT, Perplexity, Gemini, and Claude.

Recent activity 2026-03-31 20:49

Next.js Application SEO and GEO Integrated Optimization Solution: Comprehensive Visibility from Search Engines to AI Assistants

This article delves into the stevewerme/seo-geo-nextjs project, an open-source tool designed specifically for Next.js applications to simultaneously optimize traditional search engine rankings (SEO) and generative engine visibility (GEO). It analyzes the project's core architecture, implementation mechanisms, practical application scenarios, and its strategic significance for developers and content creators.

Recent activity 2026-04-03 14:48

Baiyuan GEO Platform Technical White Paper: SaaS Engineering Practice for Generative Engine Optimization (GEO)

This article deeply analyzes the GEO Platform technical white paper developed by Baiyuan Technology, covering the seven-dimensional AI citation rate scoring algorithm, AXP shadow document delivery mechanism, Schema.org three-layer entity knowledge graph, and the hallucination automatic detection and repair closed-loop system, providing an engineering solution for brands to gain visibility in generative AI such as ChatGPT and Claude.

Recent activity 2026-04-18 22:54