Zing Forum

Reading

LogicLoc: When Large Language Models Meet Datalog, Code Localization Enters a New Paradigm

Researchers found that existing code localization models over-rely on keyword matching, so they proposed the LogicLoc framework, which combines LLMs with Datalog logical reasoning to achieve precise code structure reasoning without keyword prompts.

代码定位大语言模型Datalog神经符号AI软件工程程序分析Agentic工作流
Published 2026-04-17 20:49Recent activity 2026-04-20 09:49Estimated read 4 min
LogicLoc: When Large Language Models Meet Datalog, Code Localization Enters a New Paradigm
1

Section 01

Introduction: LogicLoc—A New Paradigm for Code Localization Combining LLMs and Datalog

Researchers found that existing code localization models over-rely on keyword matching and have a "keyword shortcut" bias. They proposed the LogicLoc framework, which combines Large Language Models (LLMs) with Datalog logical reasoning to achieve precise code structure reasoning without keyword prompts, bringing a new paradigm to code localization technology.

2

Section 02

Background: Keyword Shortcut Problem of Existing Models and Challenges in Structural Reasoning

Existing code localization models rely on keyword matching (e.g., file paths, function names), and their performance drops sharply when keywords are removed, exposing the flaw of lacking structural reasoning ability. The core challenge of code localization is understanding the semantic structure of code. Traditional methods have limitations such as weak generalization, shallow semantics, and dependence on naming. The research team defined a new challenge of "keyword-agnostic logical code localization" and built the KA-LogicQuery diagnostic benchmark.

3

Section 03

Method: Design of LogicLoc's Neuro-Symbolic Hybrid Architecture

The LogicLoc framework consists of three stages: 1. Program Fact Extraction (statically analyze the codebase to generate a Datalog fact base); 2. Datalog Program Synthesis (LLM generates query programs based on natural language questions and fact patterns); 3. Verification and Feedback Optimization (Parser-Gated mechanism checks and guides corrections). Technical innovations include a deterministic reasoning engine, verifiable intermediate representations, and efficient token usage.

4

Section 04

Evidence: Experimental Results of Dual Breakthroughs in Performance and Efficiency

In the KA-LogicQuery benchmark (without keywords), LogicLoc significantly outperforms existing SOTA models; it remains competitive in traditional benchmarks with keywords; in terms of efficiency, it reduces token consumption, improves execution speed, and enhances scalability.

5

Section 05

Conclusion: Feasibility and Value of the Neuro-Symbolic Hybrid Path

LogicLoc verifies the advantages of the neuro-symbolic hybrid architecture in structural reasoning tasks. Datalog provides interpretability and verifiability, offering important insights for AI-assisted software engineering.

6

Section 06

Suggestions: Benchmark Optimization and Future Directions

In the future, it is necessary to improve benchmarks to evaluate real reasoning ability, further explore the application of neuro-symbolic hybrids in more software engineering tasks, and build more reliable and interpretable AI-assisted development tools.