# L-DSGraph: Innovative Application of Lightweight Dual-Stream Gated Graph Neural Network in Statement-Level Software Defect Localization

> This article introduces L-DSGraph, a dual-stream graph neural network framework for statement-level software defect localization. The model achieves accurate identification and ranking of defective code statements by fusing spectrum-based defect localization features, lexical features, and abstract syntax tree (AST) structural features.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-09T13:16:11.000Z
- 最近活动: 2026-06-09T13:19:03.589Z
- 热度: 150.9
- 关键词: 软件缺陷定位, 图神经网络, 代码分析, 机器学习, 软件工程, SBFL, 抽象语法树, 深度学习
- 页面链接: https://www.zingnex.cn/en/forum/thread/l-dsgraph
- Canonical: https://www.zingnex.cn/forum/thread/l-dsgraph
- Markdown 来源: floors_fallback

---

## L-DSGraph: Lightweight Dual-Stream Gated GNN for Statement-Level Defect Localization (Introduction)

This post introduces L-DSGraph, an innovative lightweight dual-stream gated graph neural network framework for statement-level software defect localization. It fuses three complementary information sources—spectrum-based fault localization (SBFL) features, lexical features, and abstract syntax tree (AST) structural features—to accurately identify and rank defect-prone code statements. The project is open-source on GitHub (https://github.com/koukouneed/L-DSGraph) by author koukouneed, released on June 9, 2026.

## Background: Challenges in Statement-Level Defect Localization

In software development, locating and fixing bugs is resource-intensive. Traditional methods rely on human experience and debugging tools, which become inefficient as software scales. Statement-level defect localization aims to pinpoint specific faulty lines (not just functions/modules), but faces challenges like handling massive code lines and hiding defects among normal ones. Recent ML/DL methods, especially GNNs (good at modeling code structure), offer new solutions.

## L-DSGraph Framework: Fusion of Three Information Sources

L-DSGraph integrates three key feature types:
1. **SBFL Features**: Combines results from SBFL formulas (Ochiai, Zoltar) with test case coverage matrices and pass/fail status to generate initial suspiciousness scores.
2. **Lexical Features**: Uses hash-based encoding (default 64D) or learnable token embeddings to capture identifiers, keywords, and operators.
3. **AST Structural Features**: Encodes AST node types into learnable embeddings and uses AST edges to build GNN graph structures, capturing code hierarchy and control flow.

## Core Technical Mechanisms of L-DSGraph

The framework uses:
- **Dual-Stream Gated Fusion**: Separates feature types to avoid interference, then adaptively fuses them via learnable Sigmoid gates (lightweight vs. Transformer).
- **GRU-based Message Passing**: Uses GRU for message propagation in GNNs to capture sequential dependencies in code execution.
- **Statement Ranking**: Outputs a suspiciousness rank of code statements based on fused features, helping developers prioritize debugging.

## Experimentation and Evaluation of L-DSGraph

**Benchmarks**: Compared with GCN, GAT, GraMuS, Grace, GNET4FL, DEEP-FL, CNN/RNN baselines, and ablation variants.
**Dataset**: Uses ConDefects dataset (~700MB, with coverage info, AST, defect labels).
**Environment**: Python3.9+, PyTorch2.0+ (CUDA 11.7+ recommended), 16GB+ memory, 8GB+ GPU显存.
**Metrics**: Top-N accuracy, Mean First Rank (MFR), Mean Top-N (MTN).

## Practical Application Value of L-DSGraph

L-DSGraph helps:
- **Accelerate Debugging**: Guides developers to most suspicious lines, reducing debugging time.
- **Aid Code Review**: Assists reviewers in identifying potential defects.
- **Optimize Regression Testing**: Prioritizes test cases covering high-suspiciousness areas.
- **Education**: Helps learners understand defect localization logic and common code errors.

## Summary and Future Outlook

L-DSGraph advances statement-level defect localization by fusing multiple features in a lightweight architecture. Its open-source implementation (with models, baselines, docs, dataset) benefits researchers and practitioners. Future directions: Integrate with large language models (LLMs) to enhance semantic understanding and interpretability.
