Zing Forum

Reading

Flowguard: An Intelligent Code Defect Detection and Repair System Based on Large Language Models

Flowguard is an open-source project based on the LLMSAN paper, applying large language models to source code defect detection and automatic repair. It supports multiple common vulnerability types such as null pointers, division by zero, and type conversion, and provides a complete FastAPI backend and Next.js frontend.

代码缺陷检测大语言模型静态分析自动修复Tree-sitterFastAPILLMSAN软件安全
Published 2026-05-13 14:15Recent activity 2026-05-13 14:22Estimated read 7 min
Flowguard: An Intelligent Code Defect Detection and Repair System Based on Large Language Models
1

Section 01

[Introduction] Flowguard: Core Introduction to the LLM-based Intelligent Code Defect Detection and Repair System

Flowguard is an open-source project based on the LLMSAN paper from Purdue University's EMNLP 2024. It applies large language models to source code defect detection and automatic repair. It supports multiple common vulnerability types such as null pointers, division by zero, and type conversion, and provides a complete engineering implementation of FastAPI backend and Next.js frontend, aiming to combine LLM's semantic understanding capabilities to address the limitations of traditional static analysis tools.

2

Section 02

Project Background and Motivation

In the software development lifecycle, code defect detection is a key link to ensure quality. Traditional static analysis tools have problems such as high false positive rates and difficulty handling complex logic. With the development of large language model technology, using LLM semantic understanding for code analysis has become a new direction. Based on this trend, Flowguard transforms the LLMSAN research results into a production-grade tool, reproduces the core algorithms, and provides RESTful API and web interface.

3

Section 03

Core Technical Architecture

Flowguard adopts a front-end and back-end separation architecture:

  • Backend (flowguard-api):Based on the FastAPI framework, core components include analysis engine (parses source code syntax structure), detector (LLM identifies potential defects), repairer (generates and verifies repair suggestions), parser (Tree-sitter implements multi-language syntax analysis); deployed via Docker containerization, with built-in CI/CD processes (code inspection, unit testing, image building, etc.).
  • Frontend (flowguard-web):Developed based on Next.js, providing a syntax-highlighted editor, structured result display (defect location, risk level, repair suggestions), and one-click repair function.
4

Section 04

Supported Defect Types and Multi-language Expansion

Supported Defect Types: Null Pointer Dereference (NPD), Division by Zero (DBZ), Type Conversion Issues (CI), Array Out-of-Bounds Access (APT), Cross-Site Scripting (XSS), covering from memory security to application security levels. Multi-language Support: Implemented via the Tree-sitter syntax parsing library, currently fully supporting Java; extending new languages requires introducing the corresponding Tree-sitter syntax library, adjusting node type matching rules, and adapting defect detection modes. The documentation provides links to syntax files for mainstream languages.

5

Section 05

Relationship with LLMSAN and Deployment Methods

LLMSAN Adaptation: Flowguard's core logic comes from the LLMSAN paper. Engineering improvements include: changing file I/O to string input (adapting to API scenarios), replacing disk cache with streaming API (improving response speed), using Pydantic to standardize request and response formats, and saving repair reasoning information. Deployment: The backend is distributed via Docker images (with built-in Tree-sitter Java library); the frontend is managed with npm; the service can be started by configuring the OpenAI API key. The API supports file upload, streaming result return, and batch processing.

6

Section 06

Practical Application Value and Limitations

Application Scenarios: Code review assistance (pre-submission scanning), legacy code analysis (security audit), education and training (understanding code pitfalls), continuous integration (automated inspection). Compared to traditional tools, its advantage lies in LLM's ability to handle complex logic and context. Limitations: Mainly supports Java language, depends on OpenAI API (has data privacy considerations), and LLM reasoning cost is relatively high.

7

Section 07

Conclusion and Future Outlook

Flowguard represents a new direction for code analysis tools: combining traditional static analysis with LLM semantic understanding. In the future, it will support local open-source large models, expand more programming languages, optimize reasoning performance to reduce costs, and deepen integration with enterprise development tools, which is expected to play a greater role in software development.