# SCTS: Enabling Large Language Models to Autonomously Detect Code Vulnerabilities via Self-Critique Tree Search

> An iterative reasoning framework based on Monte Carlo Tree Search that enables an 8B-parameter model to outperform a 32B-parameter model on out-of-distribution vulnerability detection tasks, achieving self-supervised optimization without labeled data.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-27T11:21:39.000Z
- 最近活动: 2026-05-27T12:49:39.845Z
- 热度: 154.5
- 关键词: 代码安全, 漏洞检测, 大语言模型, 蒙特卡洛树搜索, 自监督学习, OOD泛化
- 页面链接: https://www.zingnex.cn/en/forum/thread/scts
- Canonical: https://www.zingnex.cn/forum/thread/scts
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: SCTS: Enabling Large Language Models to Autonomously Detect Code Vulnerabilities via Self-Critique Tree Search

An iterative reasoning framework based on Monte Carlo Tree Search that enables an 8B-parameter model to outperform a 32B-parameter model on out-of-distribution vulnerability detection tasks, achieving self-supervised optimization without labeled data.

## Original Authors and Source

- **Original Author/Maintainer**: zhurui1995
- **Source Platform**: GitHub
- **Original Title**: Self-Critique_Tree_Search: Empowering Large Language Models with Autonomous Reasoning for Novel Code Vulnerability Detection
- **Original Link**: https://github.com/zhurui1995/Self-Critique_Tree_Search
- **Publication Date**: 2026-05-27

---

## Background: The Dilemma of Code Vulnerability Detection

In the field of software security, detecting potential vulnerabilities in code has always been a core challenge for developers and security engineers. Traditional supervised learning methods perform well on known vulnerability patterns, but they often struggle with novel out-of-distribution (OOD) vulnerabilities—these models cannot generalize to vulnerability patterns not seen in the training data.

At the same time, large language models (LLMs) have shown strong capabilities in code understanding and generation, but they still face issues of logical inconsistency and low recall in zero-shot reasoning scenarios. The root cause is that these methods simplify the complex vulnerability analysis process into a single-step prediction, ignoring the fact that vulnerability detection is essentially a task requiring iterative exploration and deep reasoning.

## Core Idea of SCTS: Combining Self-Critique and Tree Search

SCTS (Self-Critique Tree Search) proposes a new solution: performing iterative self-supervised optimization during the reasoning phase instead of relying on expensive labeled data. This method combines the Monte Carlo Tree Search (MCTS) framework with large language models, allowing the model to play a dual role during reasoning—both as a reasoner that generates vulnerability analysis reports and as a critic that reviews and revises the reports.

The core insight of this design is: For complex reasoning tasks like vulnerability analysis, explicit search mechanisms and self-correction capabilities are effective paths to achieve robust OOD generalization. By enabling the model to self-critique and self-improve, SCTS can gradually converge to logically consistent conclusions without ground truth labels.

## Methodology: Four-Stage Iterative Cycle

Each iteration of SCTS includes four key stages, forming a complete self-evolution cycle:

## 1. Selection

The agent traverses the existing search tree using the Upper Confidence Bound (UCB) strategy, balancing the exploitation of high-reward analysis paths and the exploration of unexplored paths. This stage selects the most promising nodes (i.e., existing vulnerability analysis reports) for expansion.

## 2. Expansion

The selected node is expanded by generating new, potentially better analysis reports. This process includes two reflective steps:

- **Self-Critique**: The large language model first generates a critical review of the selected report, identifying logical flaws, omissions, or weaknesses.
- **Guided Optimization**: Based on the critique, the model generates a new, optimized analysis report.

## 3. Evaluation

An important contribution of SCTS is the design of a reward function that does not require ground truth labels. The newly generated reports are weighted and scored based on the following internal quality metrics:

- **Confidence**: A score where the model evaluates the internal certainty and logical coherence of the report.
- **Specificity**: A rule-based score that rewards reports providing specific, parsable details (e.g., vulnerability type, line number).
- **Consistency**: A score where the model evaluates how well the new report addresses the issues identified during the self-critique stage.