Zing Forum

Reading

L-DSGraph: Innovative Application of Lightweight Dual-Stream Gated Graph Neural Network in Statement-Level Software Defect Localization

This article introduces L-DSGraph, a dual-stream graph neural network framework for statement-level software defect localization. The model achieves accurate identification and ranking of defective code statements by fusing spectrum-based defect localization features, lexical features, and abstract syntax tree (AST) structural features.

软件缺陷定位图神经网络代码分析机器学习软件工程SBFL抽象语法树深度学习
Published 2026-06-09 21:16Recent activity 2026-06-09 21:19Estimated read 5 min
L-DSGraph: Innovative Application of Lightweight Dual-Stream Gated Graph Neural Network in Statement-Level Software Defect Localization
1

Section 01

L-DSGraph: Lightweight Dual-Stream Gated GNN for Statement-Level Defect Localization (Introduction)

This post introduces L-DSGraph, an innovative lightweight dual-stream gated graph neural network framework for statement-level software defect localization. It fuses three complementary information sources—spectrum-based fault localization (SBFL) features, lexical features, and abstract syntax tree (AST) structural features—to accurately identify and rank defect-prone code statements. The project is open-source on GitHub (https://github.com/koukouneed/L-DSGraph) by author koukouneed, released on June 9, 2026.

2

Section 02

Background: Challenges in Statement-Level Defect Localization

In software development, locating and fixing bugs is resource-intensive. Traditional methods rely on human experience and debugging tools, which become inefficient as software scales. Statement-level defect localization aims to pinpoint specific faulty lines (not just functions/modules), but faces challenges like handling massive code lines and hiding defects among normal ones. Recent ML/DL methods, especially GNNs (good at modeling code structure), offer new solutions.

3

Section 03

L-DSGraph Framework: Fusion of Three Information Sources

L-DSGraph integrates three key feature types:

  1. SBFL Features: Combines results from SBFL formulas (Ochiai, Zoltar) with test case coverage matrices and pass/fail status to generate initial suspiciousness scores.
  2. Lexical Features: Uses hash-based encoding (default 64D) or learnable token embeddings to capture identifiers, keywords, and operators.
  3. AST Structural Features: Encodes AST node types into learnable embeddings and uses AST edges to build GNN graph structures, capturing code hierarchy and control flow.
4

Section 04

Core Technical Mechanisms of L-DSGraph

The framework uses:

  • Dual-Stream Gated Fusion: Separates feature types to avoid interference, then adaptively fuses them via learnable Sigmoid gates (lightweight vs. Transformer).
  • GRU-based Message Passing: Uses GRU for message propagation in GNNs to capture sequential dependencies in code execution.
  • Statement Ranking: Outputs a suspiciousness rank of code statements based on fused features, helping developers prioritize debugging.
5

Section 05

Experimentation and Evaluation of L-DSGraph

Benchmarks: Compared with GCN, GAT, GraMuS, Grace, GNET4FL, DEEP-FL, CNN/RNN baselines, and ablation variants. Dataset: Uses ConDefects dataset (~700MB, with coverage info, AST, defect labels). Environment: Python3.9+, PyTorch2.0+ (CUDA 11.7+ recommended), 16GB+ memory, 8GB+ GPU显存. Metrics: Top-N accuracy, Mean First Rank (MFR), Mean Top-N (MTN).

6

Section 06

Practical Application Value of L-DSGraph

L-DSGraph helps:

  • Accelerate Debugging: Guides developers to most suspicious lines, reducing debugging time.
  • Aid Code Review: Assists reviewers in identifying potential defects.
  • Optimize Regression Testing: Prioritizes test cases covering high-suspiciousness areas.
  • Education: Helps learners understand defect localization logic and common code errors.
7

Section 07

Summary and Future Outlook

L-DSGraph advances statement-level defect localization by fusing multiple features in a lightweight architecture. Its open-source implementation (with models, baselines, docs, dataset) benefits researchers and practitioners. Future directions: Integrate with large language models (LLMs) to enhance semantic understanding and interpretability.