Reading

Hybrid Multi-Agent Architecture: Enhancing CodeQL Static Analysis with LLM, 4x F1 Score Improvement

This cybersecurity master's thesis proposes an innovative three-agent hybrid architecture that combines large language models (LLMs) with the CodeQL static analysis tool. The Analyzer agent validates CodeQL results, the Suggestor agent identifies coverage gaps, and the Creator agent generates new queries. On a Python vulnerability dataset, this approach achieves a 4x improvement in F1 score from 0.11 to 0.43.

CodeQLSASTLLMStatic AnalysisVulnerability DetectionMulti-AgentDevSecOpsSecurity

Published 2026-04-10 17:00Recent activity 2026-04-10 17:20Estimated read 5 min

Hybrid Multi-Agent Architecture: Enhancing CodeQL Static Analysis with LLM, 4x F1 Score Improvement

Section 01

【Introduction】Hybrid Multi-Agent Architecture: Core Breakthroughs in LLM-Enhanced CodeQL Static Analysis

This article proposes an innovative three-agent hybrid architecture that combines LLMs with CodeQL to address the limitations of traditional SAST tools. Through a closed loop formed by the Analyzer, Suggestor, and Creator agents, it achieves a 4x improvement in F1 score from 0.11 to 0.43 on a Python vulnerability dataset, while retaining CodeQL's determinism and auditability.

Section 02

【Background】Dilemmas of Static Analysis Tools and the Necessity of Hybrid Solutions

SAST tools like CodeQL have two major limitations: lack of contextual reasoning leading to false positives and inability to detect new vulnerability patterns; pure LLM approaches face issues with reproducibility, cost, and DevSecOps integration. There is a need to explore hybrid solutions that retain CodeQL's advantages while leveraging LLM's enhancement capabilities.

Section 03

【Methodology】Design Details of the Three-Agent Hybrid Architecture

The system includes three specialized agents:

Analyzer Agent: Runs CodeQL to parse results, and uses LLM to validate alerts (judging true vulnerabilities based on source code context);
Suggestor Agent: Analyzes CodeQL coverage gaps (false negatives) and generates structured improvement proposals (e.g., missing source/sink points);
Creator Agent: Converts proposals into CodeQL queries and attempts compilation validation. The design retains CodeQL's determinism while using LLM to handle contextual reasoning tasks.

Section 04

【Evidence】Experimental Results and Performance Evaluation

Dataset: 27 Python vulnerability files covering CWE-78 (7), CWE-89 (10), CWE-79 (10). Performance Results:

System	Precision	Recall	F1 Score
Analyzer Agent	0.667	0.320	0.432
Baseline CodeQL	0.167	0.080	0.108
The F1 score improved by approximately 4x.
LLM-as-Judge Evaluation: The average quality of Suggestor is 4.78/5, and the query quality of Creator is 3.0/5 (lower quality for CWE-78 generation).

Section 05

【Limitations and Outlook】Current Shortcomings and Future Directions

Limitations: Generated queries require manual syntax adjustments; only covers 3 types of CWE; small dataset size; only supports Python. Future Directions: Improve Creator's code generation capability; expand to more CWEs and programming languages; integrate into CI/CD pipelines; explore efficient prompt engineering.

Section 06

【Industry Implications】Significance of Hybrid Architecture for Security Tool Development

Hybrid is Better Than Replacement: LLM serves as an enhancement layer, retaining the auditability and interpretability of traditional tools;
Agent Specialization: Agents with clear division of labor are more effective than general-purpose agents;
Human-AI Collaboration: Generated queries need manual refinement, reflecting AI assistance rather than replacement;
Integratability: Compatible with CodeQL CLI, seamlessly integrating into existing DevSecOps workflows.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15