Reading

Automap: An Automated Knowledge Graph Generation System Based on Multi-Agent Architecture

Automap is an agent pipeline that uses large language models (LLMs) and LangGraph to implement automated RML mapping and knowledge graph (KG) materialization. It adopts a decentralized multi-agent architecture to complete the full process from CSV schema analysis to final KG validation.

知识图谱RMLYARRRMLLangGraph多智能体自动化映射本体SHACLSPARQL数据集成

Published 2026-05-29 16:45Recent activity 2026-05-29 16:49Estimated read 7 min

Automap: An Automated Knowledge Graph Generation System Based on Multi-Agent Architecture

Section 01

Automap Project Guide: An Automated Knowledge Graph Generation System Based on Multi-Agent Architecture

This article introduces the Automap system, an agent pipeline that uses large language models (LLMs) and LangGraph to implement automated RML mapping and knowledge graph (KG) materialization. Its core features include a decentralized multi-agent architecture, multi-level self-correction validation mechanisms, and terminal-first observability design. It can complete the full process from CSV schema analysis to final KG validation, aiming to solve the time-consuming and error-prone problem of manually writing RML mapping rules in traditional methods.

Section 02

Project Background and Problem Definition

In the field of data integration and the Semantic Web, converting structured data such as CSV into knowledge graphs (KGs) is a common but complex task. Traditional methods rely on domain experts to manually write RML mapping rules, which is time-consuming and error-prone. With the rise of LLMs, the Automap project emerged, aiming to build a fully automated agent pipeline that can complete the entire process from CSV schema analysis to KG materialization without human intervention.

Section 03

System Architecture and Core Process

Automap adopts a decentralized multi-agent architecture, using LangGraph to implement collaboration and state transition. The core process includes: 1. Schema analysis (extracting CSV column names, sample values, and data types); 2. Ontology reconnaissance (parsing classes, object properties, and data properties in ontology files); 3. Semantic mapping (LLM reasoning to connect CSV columns with ontology concepts); 4. Schema alignment (planning entity structure and cross-references); 5. Competency question generation (for subsequent validation); 6. YARRRML generation (split into three parallel agents: PrefixAgent, EntityAgent, RelationshipAgent, with prefix reuse via KV-Cache).

Section 04

Self-Correction and Validation Mechanisms

Automap has built-in multi-level validation to ensure output quality: 1. Syntax validation (using the yatter tool, up to 10 retries); 2. Logical refinement (checking for broken mappings, missing columns, etc., up to 6 retries); 3. SHACL validation (three-level strategy: Astrea API → local rdflib → structural fallback); 4. SPARQL CQ validation (converting competency questions into SPARQL ASK queries, executed in pyoxigraph in-memory storage without external endpoints).

Section 05

Terminal-First Observability Design

Automap adopts a "terminal-first" design and does not rely on cloud tools like LangSmith. Developers can view the following in the console: real-time phase tracking, phase time summary, logical refinement feedback, syntax validation status (PASS/FAIL and error summary), SHACL results (number of violations and sources), and CQ validation details. This design is suitable for scenarios with sensitive data privacy or network restrictions.

Section 06

Technical Implementation Details

Technical details include: 1. Environment management (using uv for Python dependency management, supporting Docker containerization, and automatically applying Morph-KGC compatibility patches); 2. Model configuration (flexibly setting models for different agents via environment variables, such as LLM_MODEL_DEFAULT, LLM_MODEL_SCHEMA, etc.); 3. Multi-level evaluation (pipeline success rate, precision/recall/F1 against standard KGs, column coverage, CQ coverage).

Section 07

Application Scenarios and Value

Automap is suitable for: 1. Enterprise data integration (converting CSV from legacy systems to standard KGs); 2. Academic research (quickly building domain ontology datasets); 3. Data governance (unifying semantic representation of multi-source data); 4. Low-code KG construction (lowering technical barriers).

Section 08

Summary and Future Outlook

Automap is an important direction for LLM-driven data engineering, automating complex ETL processes through an agent architecture. Its decentralized YARRRML generation, multi-level validation, and terminal observability provide references for similar projects. In the future, it will support more data sources, handle more complex mapping scenarios, and integrate other KG toolchains.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15