Reading

SCTS: Enabling Large Language Models to Autonomously Detect Code Vulnerabilities via Self-Critique Tree Search

An iterative reasoning framework based on Monte Carlo Tree Search that enables an 8B-parameter model to outperform a 32B-parameter model on out-of-distribution vulnerability detection tasks, achieving self-supervised optimization without labeled data.

代码安全漏洞检测大语言模型蒙特卡洛树搜索自监督学习OOD泛化

Published 2026-05-27 19:21Recent activity 2026-05-27 20:49Estimated read 6 min

Section 01

Introduction / Main Floor: SCTS: Enabling Large Language Models to Autonomously Detect Code Vulnerabilities via Self-Critique Tree Search

Section 02

Original Authors and Source

Original Author/Maintainer: zhurui1995
Source Platform: GitHub
Original Title: Self-Critique_Tree_Search: Empowering Large Language Models with Autonomous Reasoning for Novel Code Vulnerability Detection
Original Link: https://github.com/zhurui1995/Self-Critique_Tree_Search
Publication Date: 2026-05-27

Section 03

Background: The Dilemma of Code Vulnerability Detection

In the field of software security, detecting potential vulnerabilities in code has always been a core challenge for developers and security engineers. Traditional supervised learning methods perform well on known vulnerability patterns, but they often struggle with novel out-of-distribution (OOD) vulnerabilities—these models cannot generalize to vulnerability patterns not seen in the training data.

At the same time, large language models (LLMs) have shown strong capabilities in code understanding and generation, but they still face issues of logical inconsistency and low recall in zero-shot reasoning scenarios. The root cause is that these methods simplify the complex vulnerability analysis process into a single-step prediction, ignoring the fact that vulnerability detection is essentially a task requiring iterative exploration and deep reasoning.

Section 04

Core Idea of SCTS: Combining Self-Critique and Tree Search

SCTS (Self-Critique Tree Search) proposes a new solution: performing iterative self-supervised optimization during the reasoning phase instead of relying on expensive labeled data. This method combines the Monte Carlo Tree Search (MCTS) framework with large language models, allowing the model to play a dual role during reasoning—both as a reasoner that generates vulnerability analysis reports and as a critic that reviews and revises the reports.

The core insight of this design is: For complex reasoning tasks like vulnerability analysis, explicit search mechanisms and self-correction capabilities are effective paths to achieve robust OOD generalization. By enabling the model to self-critique and self-improve, SCTS can gradually converge to logically consistent conclusions without ground truth labels.

Section 05

Methodology: Four-Stage Iterative Cycle

Each iteration of SCTS includes four key stages, forming a complete self-evolution cycle:

Section 06

1. Selection

The agent traverses the existing search tree using the Upper Confidence Bound (UCB) strategy, balancing the exploitation of high-reward analysis paths and the exploration of unexplored paths. This stage selects the most promising nodes (i.e., existing vulnerability analysis reports) for expansion.

Section 07

2. Expansion

The selected node is expanded by generating new, potentially better analysis reports. This process includes two reflective steps:

Self-Critique: The large language model first generates a critical review of the selected report, identifying logical flaws, omissions, or weaknesses.
Guided Optimization: Based on the critique, the model generates a new, optimized analysis report.

Section 08

3. Evaluation

An important contribution of SCTS is the design of a reward function that does not require ground truth labels. The newly generated reports are weighted and scored based on the following internal quality metrics:

Confidence: A score where the model evaluates the internal certainty and logical coherence of the report.
Specificity: A rule-based score that rewards reports providing specific, parsable details (e.g., vulnerability type, line number).
Consistency: A score where the model evaluates how well the new report addresses the issues identified during the self-critique stage.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15