Zing Forum

Reading

Automap: An Automated Knowledge Graph Generation System Based on Multi-Agent Architecture

Automap is an agent pipeline that uses large language models (LLMs) and LangGraph to implement automated RML mapping and knowledge graph (KG) materialization. It adopts a decentralized multi-agent architecture to complete the full process from CSV schema analysis to final KG validation.

知识图谱RMLYARRRMLLangGraph多智能体自动化映射本体SHACLSPARQL数据集成
Published 2026-05-29 16:45Recent activity 2026-05-29 16:49Estimated read 7 min
Automap: An Automated Knowledge Graph Generation System Based on Multi-Agent Architecture
1

Section 01

Automap Project Guide: An Automated Knowledge Graph Generation System Based on Multi-Agent Architecture

This article introduces the Automap system, an agent pipeline that uses large language models (LLMs) and LangGraph to implement automated RML mapping and knowledge graph (KG) materialization. Its core features include a decentralized multi-agent architecture, multi-level self-correction validation mechanisms, and terminal-first observability design. It can complete the full process from CSV schema analysis to final KG validation, aiming to solve the time-consuming and error-prone problem of manually writing RML mapping rules in traditional methods.

2

Section 02

Project Background and Problem Definition

In the field of data integration and the Semantic Web, converting structured data such as CSV into knowledge graphs (KGs) is a common but complex task. Traditional methods rely on domain experts to manually write RML mapping rules, which is time-consuming and error-prone. With the rise of LLMs, the Automap project emerged, aiming to build a fully automated agent pipeline that can complete the entire process from CSV schema analysis to KG materialization without human intervention.

3

Section 03

System Architecture and Core Process

Automap adopts a decentralized multi-agent architecture, using LangGraph to implement collaboration and state transition. The core process includes: 1. Schema analysis (extracting CSV column names, sample values, and data types); 2. Ontology reconnaissance (parsing classes, object properties, and data properties in ontology files); 3. Semantic mapping (LLM reasoning to connect CSV columns with ontology concepts); 4. Schema alignment (planning entity structure and cross-references); 5. Competency question generation (for subsequent validation); 6. YARRRML generation (split into three parallel agents: PrefixAgent, EntityAgent, RelationshipAgent, with prefix reuse via KV-Cache).

4

Section 04

Self-Correction and Validation Mechanisms

Automap has built-in multi-level validation to ensure output quality: 1. Syntax validation (using the yatter tool, up to 10 retries); 2. Logical refinement (checking for broken mappings, missing columns, etc., up to 6 retries); 3. SHACL validation (three-level strategy: Astrea API → local rdflib → structural fallback); 4. SPARQL CQ validation (converting competency questions into SPARQL ASK queries, executed in pyoxigraph in-memory storage without external endpoints).

5

Section 05

Terminal-First Observability Design

Automap adopts a "terminal-first" design and does not rely on cloud tools like LangSmith. Developers can view the following in the console: real-time phase tracking, phase time summary, logical refinement feedback, syntax validation status (PASS/FAIL and error summary), SHACL results (number of violations and sources), and CQ validation details. This design is suitable for scenarios with sensitive data privacy or network restrictions.

6

Section 06

Technical Implementation Details

Technical details include: 1. Environment management (using uv for Python dependency management, supporting Docker containerization, and automatically applying Morph-KGC compatibility patches); 2. Model configuration (flexibly setting models for different agents via environment variables, such as LLM_MODEL_DEFAULT, LLM_MODEL_SCHEMA, etc.); 3. Multi-level evaluation (pipeline success rate, precision/recall/F1 against standard KGs, column coverage, CQ coverage).

7

Section 07

Application Scenarios and Value

Automap is suitable for: 1. Enterprise data integration (converting CSV from legacy systems to standard KGs); 2. Academic research (quickly building domain ontology datasets); 3. Data governance (unifying semantic representation of multi-source data); 4. Low-code KG construction (lowering technical barriers).

8

Section 08

Summary and Future Outlook

Automap is an important direction for LLM-driven data engineering, automating complex ETL processes through an agent architecture. Its decentralized YARRRML generation, multi-level validation, and terminal observability provide references for similar projects. In the future, it will support more data sources, handle more complex mapping scenarios, and integrate other KG toolchains.