Reading

Research on Vulnerability Handling Workflow Based on Role-Based Agent Architecture

This paper proposes a role-based agent workflow architecture for software security vulnerability handling, consisting of four core roles: planner, analyst, fixer, and verifier. Through CodeQL integration and multi-model collaboration, it achieves a 44% detection accuracy and 19% repair accuracy on 25 real C/C++ vulnerabilities.

agentic workflowvulnerability handlingsoftware securityrole-based architectureCodeQLmulti-agent systemLLM security

Published 2026-06-12 16:45Recent activity 2026-06-15 12:22Estimated read 6 min

Section 01

Research on Vulnerability Handling Workflow Based on Role-Based Agent Architecture (Introduction)

Original Author/Maintainer: Paper Author Team (arXiv) Source Platform: arXiv Publication Date: 2026-06-12 Core Viewpoints: This paper proposes a role-based agent workflow architecture for software security vulnerability handling, consisting of four core roles: planner, analyst, fixer, and verifier. Through CodeQL integration and multi-model collaboration, it achieves a 44% detection accuracy and 19% repair accuracy on 25 real C/C++ vulnerabilities. Original Link: http://arxiv.org/abs/2606.14261v1

Section 02

Background and Problems

Current software security methods based on Large Language Models (LLMs) mostly focus on isolated tasks (such as vulnerability detection or patch generation) and lack agent architecture designs that reflect industrial practices, leading to a significant gap with actual work requirements. Traditional single-task processing methods cannot effectively simulate the collaboration mode of security engineers in real scenarios, so a multi-role collaborative agent architecture needs to be designed to improve the application effect of LLMs.

Section 03

Core Method: Role-Based Agent Workflow

This study proposes a role-based agent workflow architecture, decomposed into four core roles:

Planner: Formulates the overall vulnerability handling strategy, determines the analysis scope, repair priority, and resource allocation;
Analyst: Conducts in-depth analysis of vulnerability types, severity, and root causes, integrates the CodeQL static analysis tool to enhance code analysis capabilities;
Fixer: Generates repair solutions such as code patches and configuration adjustments based on analysis results;
Verifier: Validates the effectiveness of repair solutions to ensure vulnerabilities are fixed without new issues.

Section 04

Experimental Design and Results

Experimental Configuration: Evaluated on 25 real C/C++ vulnerabilities, using models including nemotron-cascade-2:30b, qwen3-coder-next, gpt-oss:120b; Results: Detection accuracy of 44% (equivalent to GPT5.5), repair accuracy of 19%; CodeQL integration significantly improves analysis depth and accuracy.

Section 05

Practical Significance and Limitations

Significance:

Realizes end-to-end automation of software security workflows, alleviating the shortage of security talents;
Provides a new mode of human-machine collaboration, with roles corresponding to actual team functions;
The architecture is scalable, supporting the addition of new roles or adjustment of responsibilities; Limitations:
Repair accuracy (19%) needs to be improved;
The evaluation dataset size is limited (25 vulnerabilities);
Only supports C/C++ languages, and applicability to other languages needs to be verified.

Section 06

Conclusions and Future Directions

Conclusions: The role-based agent architecture opens up a new direction for the application of LLMs in the software security field. Decomposing tasks into professional roles improves interpretability and maintainability, demonstrating the potential of multi-agent collaboration; Future Directions:

Reinforcement learning to optimize agent collaboration strategies;
Integrate security vulnerability databases and best practice guidelines;
Expand multi-language support;
Develop real-time human-machine collaboration interfaces.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23