Reading

KGdLLM: Learning Logical Reasoning on Knowledge Graphs Using Discrete Diffusion Models

KGdLLM is an experimental framework that explores the knowledge acquisition and logical reasoning capabilities of discrete masked diffusion language models (MDM/LLaDA style) on knowledge graphs. This article deeply analyzes its decoupled architecture, training pipeline, and evaluation methods.

扩散模型知识图谱逻辑推理LLaDAMDM离散扩散SFT预训练

Published 2026-05-18 13:34Recent activity 2026-05-18 13:53Estimated read 7 min

KGdLLM: Learning Logical Reasoning on Knowledge Graphs Using Discrete Diffusion Models

Section 01

KGdLLM Framework Guide: Exploration of Discrete Diffusion Models in Knowledge Graph Reasoning

KGdLLM is an experimental research framework created by Tieumi221E, aiming to explore the knowledge acquisition and logical reasoning capabilities of discrete masked diffusion language models (MDM/LLaDA style) on knowledge graphs. This article will analyze its core content such as decoupled architecture, training pipeline, and evaluation methods, and discuss the potential of diffusion models in the field of structured knowledge reasoning.

Section 02

Background: Basics of Discrete Masked Diffusion Language Models

Autoregressive vs. Diffusion Generation Paradigms

Traditional autoregressive models (e.g., GPT, Llama) have limitations such as error accumulation and lack of a global perspective; diffusion models, through forward noising (gradually masking tokens) and reverse denoising (recovering original tokens), have advantages like bidirectional context, iterative correction, and potential for parallel decoding.

MDM and LLaDA

KGdLLM references two discrete diffusion models: MDM (proposed by Austin et al. in 2021, using Bernoulli sampling for masking) and LLaDA (improved masking strategy and training objectives). Its diffusion_core module implements the core algorithm.

Section 03

Decoupled Architecture: Separation of Core Engine and Experimental Logic

Core Engine (diffusion_core/)

Includes model.py (bidirectional Transformer architecture), masking.py (LLaDA-style forward noising), loss.py (masked cross-entropy + 1/p importance sampling), and inference.py (block-level parallel decoding + confidence-based re-masking), which are independent and reusable.

Experimental Scripts (scripts/)

Data pipeline: prepare_kg_dataset.py converts triples into training text format;
Training scripts: train_mdm.py (bidirectional mask pre-training), train_sft.py (supervised fine-tuning);
Evaluation and analysis: eval_all_checkpoints.py (multi-dimensional reasoning evaluation), plot_results.py (visualization), plot_summary.py (comparative analysis).

Section 04

Training Pipeline: From Pre-training to Supervised Fine-tuning

Pre-training Phase (Knowledge Acquisition)

Convert knowledge graph triples into text sequences. Through forward noising with dynamic mask ratios, the model predicts tokens at masked positions, calculates masked cross-entropy loss combined with importance sampling, and learns structured knowledge.

Supervised Fine-tuning Phase (Logical Reasoning)

Use instruction-formatted dialogue data (e.g., reasoning questions) and train with pure text generation objectives to enable the model to apply knowledge for logical reasoning.

Section 05

Evaluation Dimensions: Multi-dimensional Logical Reasoning Tests

Reverse Relation Reasoning

Tests the model's understanding of relation directionality, e.g., inferring "B is A's child" from "A is B's father".

Multi-hop Reasoning

Tests the model's ability to infer indirect relations through intermediate relations, e.g., inferring "A is C's grandfather" from "A is B's father" and "B is C's father".

Transitive Relation Reasoning

Tests the model's understanding of transitivity, e.g., inferring "A is greater than C" from "A is greater than B" and "B is greater than C".

Section 06

Technical Highlights and Research Directions

Technical Highlights

Block-level parallel decoding: Can predict multiple tokens simultaneously per step, theoretically improving reasoning speed;
Confidence-based re-masking: Re-masks and corrects low-confidence prediction positions, similar to "thinking repeatedly";
Bidirectional Transformer: Focuses on all tokens during encoding, facilitating multi-directional context reasoning.

Research Directions

Explore hybrid architectures (combining the advantages of autoregressive and diffusion models), expand to more reasoning tasks, etc.

Section 07

Limitations and Future Improvement Directions

Limitations

Mainly uses synthetic datasets; performance on real large-scale KGs (e.g., Wikidata) has not been verified;
Limited model scale; scalability is unknown;
Iterative denoising process is still slower than autoregressive models.

Future Directions

Verify on large-scale KGs, explore hybrid architectures, expand to more logical reasoning tasks.

Section 08

Summary: Value and Prospects of Diffusion Models in Knowledge Reasoning

KGdLLM provides a clear experimental platform for the application of diffusion models in structured knowledge reasoning. Its bidirectional context awareness and iterative correction capabilities bring new possibilities to this field. Although it is in the experimental stage, it has important reference value for researchers in diffusion language models and knowledge graph reasoning. Project address: https://github.com/Tieumi221E/kg-diffusion-lm

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15