Zing Forum

Reading

KGdLLM: Learning Logical Reasoning on Knowledge Graphs Using Discrete Diffusion Models

KGdLLM is an experimental framework that explores the knowledge acquisition and logical reasoning capabilities of discrete masked diffusion language models (MDM/LLaDA style) on knowledge graphs. This article deeply analyzes its decoupled architecture, training pipeline, and evaluation methods.

扩散模型知识图谱逻辑推理LLaDAMDM离散扩散SFT预训练
Published 2026-05-18 13:34Recent activity 2026-05-18 13:53Estimated read 7 min
KGdLLM: Learning Logical Reasoning on Knowledge Graphs Using Discrete Diffusion Models
1

Section 01

KGdLLM Framework Guide: Exploration of Discrete Diffusion Models in Knowledge Graph Reasoning

KGdLLM is an experimental research framework created by Tieumi221E, aiming to explore the knowledge acquisition and logical reasoning capabilities of discrete masked diffusion language models (MDM/LLaDA style) on knowledge graphs. This article will analyze its core content such as decoupled architecture, training pipeline, and evaluation methods, and discuss the potential of diffusion models in the field of structured knowledge reasoning.

2

Section 02

Background: Basics of Discrete Masked Diffusion Language Models

Autoregressive vs. Diffusion Generation Paradigms

Traditional autoregressive models (e.g., GPT, Llama) have limitations such as error accumulation and lack of a global perspective; diffusion models, through forward noising (gradually masking tokens) and reverse denoising (recovering original tokens), have advantages like bidirectional context, iterative correction, and potential for parallel decoding.

MDM and LLaDA

KGdLLM references two discrete diffusion models: MDM (proposed by Austin et al. in 2021, using Bernoulli sampling for masking) and LLaDA (improved masking strategy and training objectives). Its diffusion_core module implements the core algorithm.

3

Section 03

Decoupled Architecture: Separation of Core Engine and Experimental Logic

Core Engine (diffusion_core/)

Includes model.py (bidirectional Transformer architecture), masking.py (LLaDA-style forward noising), loss.py (masked cross-entropy + 1/p importance sampling), and inference.py (block-level parallel decoding + confidence-based re-masking), which are independent and reusable.

Experimental Scripts (scripts/)

  • Data pipeline: prepare_kg_dataset.py converts triples into training text format;
  • Training scripts: train_mdm.py (bidirectional mask pre-training), train_sft.py (supervised fine-tuning);
  • Evaluation and analysis: eval_all_checkpoints.py (multi-dimensional reasoning evaluation), plot_results.py (visualization), plot_summary.py (comparative analysis).
4

Section 04

Training Pipeline: From Pre-training to Supervised Fine-tuning

Pre-training Phase (Knowledge Acquisition)

Convert knowledge graph triples into text sequences. Through forward noising with dynamic mask ratios, the model predicts tokens at masked positions, calculates masked cross-entropy loss combined with importance sampling, and learns structured knowledge.

Supervised Fine-tuning Phase (Logical Reasoning)

Use instruction-formatted dialogue data (e.g., reasoning questions) and train with pure text generation objectives to enable the model to apply knowledge for logical reasoning.

5

Section 05

Evaluation Dimensions: Multi-dimensional Logical Reasoning Tests

Reverse Relation Reasoning

Tests the model's understanding of relation directionality, e.g., inferring "B is A's child" from "A is B's father".

Multi-hop Reasoning

Tests the model's ability to infer indirect relations through intermediate relations, e.g., inferring "A is C's grandfather" from "A is B's father" and "B is C's father".

Transitive Relation Reasoning

Tests the model's understanding of transitivity, e.g., inferring "A is greater than C" from "A is greater than B" and "B is greater than C".

6

Section 06

Technical Highlights and Research Directions

Technical Highlights

  • Block-level parallel decoding: Can predict multiple tokens simultaneously per step, theoretically improving reasoning speed;
  • Confidence-based re-masking: Re-masks and corrects low-confidence prediction positions, similar to "thinking repeatedly";
  • Bidirectional Transformer: Focuses on all tokens during encoding, facilitating multi-directional context reasoning.

Research Directions

Explore hybrid architectures (combining the advantages of autoregressive and diffusion models), expand to more reasoning tasks, etc.

7

Section 07

Limitations and Future Improvement Directions

Limitations

  • Mainly uses synthetic datasets; performance on real large-scale KGs (e.g., Wikidata) has not been verified;
  • Limited model scale; scalability is unknown;
  • Iterative denoising process is still slower than autoregressive models.

Future Directions

Verify on large-scale KGs, explore hybrid architectures, expand to more logical reasoning tasks.

8

Section 08

Summary: Value and Prospects of Diffusion Models in Knowledge Reasoning

KGdLLM provides a clear experimental platform for the application of diffusion models in structured knowledge reasoning. Its bidirectional context awareness and iterative correction capabilities bring new possibilities to this field. Although it is in the experimental stage, it has important reference value for researchers in diffusion language models and knowledge graph reasoning. Project address: https://github.com/Tieumi221E/kg-diffusion-lm