Zing Forum

Reading

OntologIA: An Open-Source Framework for Automatically Building Knowledge Graphs Using Large Language Models

The OntologIA project, open-sourced by Italy's National Institute of Statistics (ISTAT), automatically extracts ontology structures from unstructured text using Large Language Models (LLMs), generates JSON-formatted ontologies and interactive visual graphs, and provides a complete methodological toolset for the field of knowledge engineering.

知识图谱本体设计大语言模型语义提取ISTATStreamlitPython开源工具
Published 2026-04-30 22:14Recent activity 2026-04-30 22:18Estimated read 7 min
OntologIA: An Open-Source Framework for Automatically Building Knowledge Graphs Using Large Language Models
1

Section 01

Introduction to the OntologIA Project: An LLM-Driven Open-Source Framework for Automated Knowledge Graph Construction

The OntologIA project, open-sourced by Italy's National Institute of Statistics (ISTAT), focuses on using Large Language Models (LLMs) to automatically extract ontology structures from unstructured text, generate JSON-formatted ontologies and interactive visual graphs, and provide a complete methodological toolset for the field of knowledge engineering. This project aims to address the pain points of traditional knowledge graph construction, which relies on experts and is time-consuming and labor-intensive, and promote the development of automated knowledge modeling.

2

Section 02

Project Background and Motivation: Challenges in Traditional Knowledge Graph Construction

Knowledge graphs play a crucial role in fields such as data governance and semantic search, but traditional construction is highly dependent on manual annotation by domain experts and the skills of ontology engineers, which is time-consuming and labor-intensive and difficult to adapt to large-scale data changes. The Methodology Department of ISTAT launched the OntologIA project to explore the integration of LLMs into the ontology design process, realize the automated conversion from raw text to structured ontologies, and provide a reproducible and verifiable methodological framework.

3

Section 03

Core Functions: Automatic Extraction of Three Types of Ontology Elements

The core capabilities of OntologIA are to extract three types of ontology elements from domain text:

  1. Class Recognition: Capture implicit concept hierarchies in text and generate ontology classes;
  2. Object Property Extraction: Identify relationships between concepts and build semantic connections between classes;
  3. Data Property Mapping: Extract concept features, clarify the attribute fields and data types of classes, laying the foundation for instantiation.
4

Section 04

Output and Visualization: Structured JSON and Interactive Graphs

OntologIA provides two core outputs:

  • Structured JSON Ontology: A standardized format that is easy to import into storage systems like Neo4j, reducing integration barriers;
  • Interactive HTML Graph: A visualization component based on Pyvis that converts ontologies into scalable, draggable network diagrams, enhancing interpretability and reviewability.
5

Section 05

Semantic Constraints and Quality Control: Ensuring Output Reliability

OntologIA emphasizes strict semantic constraints:

  • Zero Hallucination Generation: All outputs are strictly limited to the content of the input text; no classes or relationships are created out of thin air;
  • Explicit Semantic Annotation: Use the is_a relationship to clarify concept inheritance hierarchies, ensuring transparency and traceability;
  • Controllable Editing Process: Supports forward/reverse processing modes for flexible adaptation to scenarios. This reflects ISTAT's strict requirements for data quality and auditability.
6

Section 06

Technical Implementation and Usage: Python Stack and Convenient Interaction

OntologIA adopts a Python technology stack, with core dependencies including Streamlit (web interface), OpenAI API (LLM calls), rdflib (RDF processing), Pyvis (visualization), and Pandas (data processing). Users can launch the Streamlit application via commands, completing the process from file upload, prompt customization to result download. The system supports automatic version control to ensure traceability.

7

Section 07

Application Scenarios and Value: Practical Empowerment Across Multiple Domains

The value of OntologIA covers multiple scenarios:

  • Academic Research: Provides a reproducible LLM-assisted ontology design methodology for fields such as knowledge engineering;
  • Government and Enterprises: Helps quickly extract structured knowledge frameworks from policy documents and business manuals;
  • Prototype Development: Lowers the entry barrier for knowledge graph projects, allowing domain experts to participate in modeling;
  • Education and Training: Intuitively demonstrates the relationship between ontologies, knowledge graphs, and LLMs.
8

Section 08

Limitations and Future Outlook: Evolution Direction of the Project

Current version limitations:

  • Lack of exception fault tolerance in JSON parsing;
  • No formal OWL/RDF semantic verification mechanism;
  • Suitable for prototype/experimental scenarios; production applications require evaluation. Future plans: automatically export OWL/RDF standard formats, ontology semantic verification, multilingual support, execution history tracking, ontology comparison tools, etc., moving towards a mature knowledge engineering platform.