# From Cloud to Edge: A Privacy-First Approach for Automated Software Vulnerability Detection Using Large Language Models

> This article introduces a multi-stage framework for detecting security vulnerabilities in source code using large language models. By comparing Google Gemini's cloud API with a locally deployed quantized Llama 3 model, it achieves vulnerability detection with a 96% recall rate while protecting code privacy.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-21T17:42:08.000Z
- 最近活动: 2026-05-21T17:52:50.233Z
- 热度: 154.8
- 关键词: 漏洞检测, LLM, 静态分析, SAST, 提示工程, 本地部署, 隐私保护, 代码安全, Llama 3, 边缘计算
- 页面链接: https://www.zingnex.cn/en/forum/thread/geo-github-anvitsatishshah-automated-software-vulnerability-detection-using-large-language
- Canonical: https://www.zingnex.cn/forum/thread/geo-github-anvitsatishshah-automated-software-vulnerability-detection-using-large-language
- Markdown 来源: floors_fallback

---

## [Introduction] From Cloud to Edge: Core Summary of the Privacy-First LLM Vulnerability Detection Solution

This article presents a graduation project by an Indian student team. Addressing the limitations of traditional SAST tools and the privacy risks of using LLMs in the cloud, the team proposes a multi-stage framework that balances detection capability and privacy protection. By comparing Google Gemini's cloud API with a locally quantized Llama 3 model and optimizing with prompt engineering, it achieves local vulnerability detection with a 96% recall rate while ensuring code privacy. The project also includes an interactive Streamlit interface, providing a practical solution for enterprises and learners.

## Problem Background: Limitations of SAST and Privacy Contradictions of LLMs

Traditional SAST tools have issues like high false positive rates and lack of semantic understanding, making it difficult to detect complex logical vulnerabilities. While LLMs can identify subtle vulnerability patterns, using them in the cloud poses risks of code privacy and intellectual property leakage. Core question: Can LLMs be run on local hardware to balance privacy and detection capability?

## Three-Stage Experimental Framework: Transition from Cloud to Local

The three-stage framework includes: 1. Cloud Baseline (Google Gemini 2.5 Flash API zero-shot inference to establish performance benchmarks); 2. Local Deployment (Meta Llama3 8B model, 4-bit quantization, run on NVIDIA RTX3060 12GB via Ollama); 3. Prompt Engineering Optimization (zero-shot, role-playing, few-shot prompts—few-shot being the most effective).

## Dataset Design and Key Results: Local Detection with 96% Recall Rate

The dataset is built based on CodeXGLUE, covering web application vulnerabilities (SQLi, XSS in Python/PHP) and system-level vulnerabilities (buffer overflow, memory leak in C/C++). Key results: Local model achieves a 96% recall rate, protects code privacy, has no API costs, and low latency suitable for CI/CD.

## Engineering Implementation and Interactive Interface: Lowering the Barrier to Use

The interactive Streamlit interface supports pasting code, selecting prompt strategies, viewing analysis reports, and comparing results. Engineering details: Hardware requirement of NVIDIA GPU with 12GB VRAM, dependency management (requirements.txt), and modular design for easy experimentation.

## Limitations and Future Directions: Areas for Improvement

Limitations include hardware barriers (RTX3060 is not widely available), high false positive rates, limited coverage of vulnerability types, and manual model updates. Future directions: More aggressive quantization (e.g., GGUF Q4_K_M), secondary filtering mechanisms, expanding vulnerability types, and establishing version management processes.

## Industry Implications: Balancing Privacy and Capability

Implications: Enterprises can balance AI capability and privacy through local deployment + prompt engineering; prompt engineering can bridge the gap in model size; the project demonstrates a complete research process and has educational value.