# Hybrid Multi-Agent Architecture: Enhancing CodeQL Static Analysis with LLM, 4x F1 Score Improvement

> This cybersecurity master's thesis proposes an innovative three-agent hybrid architecture that combines large language models (LLMs) with the CodeQL static analysis tool. The Analyzer agent validates CodeQL results, the Suggestor agent identifies coverage gaps, and the Creator agent generates new queries. On a Python vulnerability dataset, this approach achieves a 4x improvement in F1 score from 0.11 to 0.43.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-10T09:00:46.000Z
- 最近活动: 2026-04-10T09:20:10.089Z
- 热度: 141.7
- 关键词: CodeQL, SAST, LLM, Static Analysis, Vulnerability Detection, Multi-Agent, DevSecOps, Security
- 页面链接: https://www.zingnex.cn/en/forum/thread/llmcodeql-f14
- Canonical: https://www.zingnex.cn/forum/thread/llmcodeql-f14
- Markdown 来源: floors_fallback

---

## 【Introduction】Hybrid Multi-Agent Architecture: Core Breakthroughs in LLM-Enhanced CodeQL Static Analysis

This article proposes an innovative three-agent hybrid architecture that combines LLMs with CodeQL to address the limitations of traditional SAST tools. Through a closed loop formed by the Analyzer, Suggestor, and Creator agents, it achieves a 4x improvement in F1 score from 0.11 to 0.43 on a Python vulnerability dataset, while retaining CodeQL's determinism and auditability.

## 【Background】Dilemmas of Static Analysis Tools and the Necessity of Hybrid Solutions

SAST tools like CodeQL have two major limitations: lack of contextual reasoning leading to false positives and inability to detect new vulnerability patterns; pure LLM approaches face issues with reproducibility, cost, and DevSecOps integration. There is a need to explore hybrid solutions that retain CodeQL's advantages while leveraging LLM's enhancement capabilities.

## 【Methodology】Design Details of the Three-Agent Hybrid Architecture

The system includes three specialized agents:
1. **Analyzer Agent**: Runs CodeQL to parse results, and uses LLM to validate alerts (judging true vulnerabilities based on source code context);
2. **Suggestor Agent**: Analyzes CodeQL coverage gaps (false negatives) and generates structured improvement proposals (e.g., missing source/sink points);
3. **Creator Agent**: Converts proposals into CodeQL queries and attempts compilation validation. The design retains CodeQL's determinism while using LLM to handle contextual reasoning tasks.

## 【Evidence】Experimental Results and Performance Evaluation

**Dataset**: 27 Python vulnerability files covering CWE-78 (7), CWE-89 (10), CWE-79 (10).
**Performance Results**:
| System | Precision | Recall | F1 Score |
|---|---|---|---|
| Analyzer Agent | 0.667 | 0.320 | 0.432 |
| Baseline CodeQL | 0.167 | 0.080 | 0.108 |
The F1 score improved by approximately 4x.
**LLM-as-Judge Evaluation**: The average quality of Suggestor is 4.78/5, and the query quality of Creator is 3.0/5 (lower quality for CWE-78 generation).

## 【Limitations and Outlook】Current Shortcomings and Future Directions

**Limitations**: Generated queries require manual syntax adjustments; only covers 3 types of CWE; small dataset size; only supports Python.
**Future Directions**: Improve Creator's code generation capability; expand to more CWEs and programming languages; integrate into CI/CD pipelines; explore efficient prompt engineering.

## 【Industry Implications】Significance of Hybrid Architecture for Security Tool Development

1. **Hybrid is Better Than Replacement**: LLM serves as an enhancement layer, retaining the auditability and interpretability of traditional tools;
2. **Agent Specialization**: Agents with clear division of labor are more effective than general-purpose agents;
3. **Human-AI Collaboration**: Generated queries need manual refinement, reflecting AI assistance rather than replacement;
4. **Integratability**: Compatible with CodeQL CLI, seamlessly integrating into existing DevSecOps workflows.