# Knowledge Graph-Based Student Counseling Chatbot: From Unstructured Documents to Intelligent Q&A

> This article introduces an undergraduate graduation project that builds a chatbot system for student counseling scenarios. The system constructs a knowledge graph by extracting information from university rules and regulations PDF documents, and uses knowledge graph embedding models for reasoning to achieve accurate answers to students' complex consultation questions.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-09T18:38:22.000Z
- 最近活动: 2026-05-09T18:47:33.191Z
- 热度: 154.8
- 关键词: 知识图谱, 聊天机器人, 知识图谱嵌入, TransE, ComplEx, DistMult, 自然语言处理, 学生辅导, 信息抽取, AmpliGraph
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-github-davidsamy1-knowledgegraph-chatbot
- Canonical: https://www.zingnex.cn/forum/thread/llm-github-davidsamy1-knowledgegraph-chatbot
- Markdown 来源: floors_fallback

---

## [Main Floor] Guide to the Knowledge Graph-Based Student Counseling Chatbot Project

This undergraduate graduation project builds a chatbot system for student counseling scenarios. Its core is to extract information from university rules and regulations PDF documents to construct a knowledge graph, and use knowledge graph embedding models (such as TransE, ComplEx, DistMult) for reasoning to achieve accurate answers to students' complex consultation questions. It aims to solve the pain points of students spending time searching through lengthy documents and academic staff repeatedly answering similar questions, transforming static documents into a reasoning-enabled intelligent knowledge base.

## Project Background and Core Ideas

On university campuses, students often need to consult lengthy and complex rules and regulations to solve issues like course selection and graduation. Manual searching is time-consuming and prone to omissions; academic staff face heavy workloads from repeatedly answering similar questions. Targeting these pain points, this project builds a task-oriented dialogue agent. Its core innovation is combining knowledge graphs (explicitly representing entity relationships) with embedding model reasoning, breaking through the limitations of traditional keyword matching/rule engines that struggle with complex semantics, enabling it to answer complex questions like "Which courses count for credits after transferring majors?"

## System Architecture and Four-Stage Processing Flow

The system adopts a four-stage pipeline architecture:
1. Input preprocessing: Standardize user input through spell checking, grammar correction, and lemmatization (e.g., correcting "毕页要求" to "graduation requirements");
2. Input understanding and entity mapping: Use spaCy for dependency parsing to extract core components, and Fuzzywuzzy for fuzzy matching to map entities to knowledge graph nodes;
3. Knowledge graph embedding reasoning: Use AmpliGraph to implement models like TransE, DistMult, and ComplEx, map entities/relationships to low-dimensional vectors, and predict missing links through vector operations (e.g., predicting whether the tail entity is "Condition B" based on "Course A + meets condition");
4. Natural language generation: Use NLTK/Pattern to convert triples into fluent answers (e.g., triple → "Advanced Mathematics is a required course for the Computer Science major").

## Technology Stack and Implementation Details

Developed based on Python 3.10, the technology stack includes:
- AmpliGraph 2.0.0: Knowledge graph embedding and link prediction;
- spaCy 3.5.1: NLP processing (named entity recognition, part-of-speech tagging, etc.);
- Stanford-OpenIE 1.3.1: Extract triples to build the knowledge graph;
- Flask 2.2.2: Build the interactive interface;
- NLTK/Pattern: Natural language generation;
- PyPDF2 3.0.1: Extract PDF text.
Knowledge graph construction steps: PDF text extraction → Stanford OpenIE triple extraction → AmpliGraph embedding model training.

## Model Evaluation and Experimental Results

Comparison of three embedding algorithms:
- ComplEx: Uses complex vectors to capture antisymmetric relationships (e.g., "prerequisite course" and "post-requisite course"), with excellent performance;
- TransE: Uses translation operations, is computationally efficient and interpretable, suitable for simple and direct relationship reasoning;
- DistMult: A bilinear model with advantages in semantic similarity calculation.
Experimental results show that the models successfully capture the semantic relationships and structural characteristics of the university knowledge graph, and the link prediction task can accurately predict tail entities, providing reliable answers for consultations.

## Application Scenarios and Practical Value

System value:
- Students: No need to flip through PDFs; get accurate and personalized answers by asking in natural language;
- Academic staff: Automatically handle repeated consultations and focus on complex tasks.
Universality: The technical framework can be migrated to scenarios such as enterprise regulation Q&A, government policy interpretation, and medical guideline query (requiring the construction of a reasoning-enabled knowledge base from unstructured documents).

## Summary and Outlook

The project realizes a complete closed loop from unstructured documents to intelligent Q&A, combining knowledge graphs and embedding models to answer both explicit and implicit relationship questions. Future expansions can include:
- Multi-turn dialogue capabilities;
- Introducing large language models to improve interaction naturalness;
- Combining Retrieval-Augmented Generation (RAG) to handle new questions not covered by the knowledge graph.
It provides a technical foundation and practical reference for intelligent services in the education field.
