Reading

VaultRAG: A Hybrid RAG System for Obsidian Notes Combining Vector Retrieval and Knowledge Graph

A hybrid RAG system designed specifically for Obsidian note libraries, integrating vector retrieval and knowledge graph technologies. It supports multi-format document processing, intelligent chunking, multi-model switching, and knowledge graph-based query expansion, providing powerful AI Q&A capabilities for personal knowledge management.

RAGObsidian知识图谱向量检索Flask知识管理LLMPython

Published 2026-06-14 06:44Recent activity 2026-06-14 06:52Estimated read 9 min

VaultRAG: A Hybrid RAG System for Obsidian Notes Combining Vector Retrieval and Knowledge Graph

Section 01

[Introduction] VaultRAG: Core Introduction to the Hybrid RAG System for Obsidian Notes

Core Introduction to VaultRAG

VaultRAG is a hybrid Retrieval-Augmented Generation (RAG) system designed specifically for Obsidian note libraries. It integrates vector retrieval and knowledge graph technologies to provide powerful AI Q&A capabilities for personal knowledge management.

Basic Information:

Original author/maintainer: faielli
Source platform: GitHub
Release date: June 13, 2026
Project link: Python-RAG-vault

Core Features: Supports multi-format document processing, intelligent chunking, multi-model switching, incremental indexing, and knowledge graph-based query expansion.

Section 02

Project Background and Positioning

VaultRAG addresses the needs of Obsidian users (researchers, students, knowledge workers) who manage large volumes of notes, literature, and learning materials. It provides a solution to transform static note libraries into interactive knowledge bases. As a hybrid RAG system, it combines vector retrieval and knowledge graph technologies to overcome the limitations of pure vector retrieval in complex relational reasoning.

Section 03

Core Architecture and Hybrid Retrieval Mechanism

Modular Architecture

The system uses a dependency injection pattern to decouple components. The core modules are divided as follows:

Module	Responsibility
`app.py`	Flask entry point, responsible for configuration, routing, and frontend services
`rag_core.py`	Core logic: text extraction, chunking, embedding, ChromaDB management, knowledge graph construction, LLM calls
`upload_handler.py`	Flask blueprint for temporary file RAG processing (no persistence)
`model_switcher.py`	Runtime model switching (no need to restart the application)
`frontend.html`	Single-page application frontend interface

Hybrid Retrieval Strategy

Vector Retrieval Layer: Uses the all-MiniLM-L6-v2 embedding model by default (code can be switched to flax-sentence-embeddings/st-codesearch-distilroberta-base). Documents are split into 500-character chunks (with 50-character overlap).
Knowledge Graph Layer:

Sample 3 chunks from each document, extract up to 15 triples (subject | relation | object) via LLM;
Supports incremental construction (only processes new files);
Query expansion: Tokenization → calculate node overlap score → select Top-N seeds → expand 1-hop neighbors → collect related source files and relational text.

Section 04

Multi-format Support and Intelligent Features

Multi-format Document Processing

Format	Processing Method
Markdown, TXT	Direct reading
PDF	Text extraction via PyMuPDF; fallback to Tesseract OCR (200 DPI) for scanned versions
DOCX	Parsed with python-docx
EPUB	Extract HTML content using ebooklib + BeautifulSoup
ODT, ODS	Parsed with odfpy
HTML, HTM	Extract plain text with BeautifulSoup
Note: Supports OCR for mixed Italian-English documents (`ita+eng` language configuration).

Intelligent Features

Incremental Indexing: Skips unmodified files via {path: mtime} mapping;
Duplicate Detection: Identifies duplicate content with a cosine similarity threshold of dup_threshold=0.97;
Conversation History: Retains the last 20 rounds and automatically saves as Markdown with YAML frontmatter to _chat/;
Discipline Filtering: Filters by discipline/folder, falls back to global search if no results are found.

Section 05

Key Technical Configuration Points

LLM Configuration

Default model: qwen-plus
API endpoint: OpenRouter (compatible with OpenAI API format)
Max tokens: 8192
Supports runtime model switching (no need to restart the service)

Embedding Model Recommendation

For scenarios where Italian text is dominant, it is recommended to use multilingual-e5-large instead of the default all-MiniLM-L6-v2 to improve multilingual semantic understanding capabilities.

Section 06

Use Cases and Value Proposition

VaultRAG is suitable for the following scenarios:

Academic Research: Quickly locate relevant concepts and citations in literature notes;
Course Learning: Integrate courseware, textbooks, and notes to build a personal learning assistant;
Project Knowledge Management: Unified retrieval of technical documents and code notes;
Writing Assistance: Create content based on existing materials, ensuring accurate citations.

Section 07

Summary and Insights

VaultRAG provides a typical paradigm for RAG applications in the field of personal knowledge management:

Hybrid architecture is key to improving retrieval quality,弥补ing the lack of relational reasoning in pure vector retrieval;
Incremental processing and duplicate detection are essential capabilities for practical systems;
Multi-format support lowers the threshold for building knowledge bases;
Modular design facilitates maintenance and expansion.

For users who want to AI-enable their Obsidian note libraries, VaultRAG is a fully functional and architecturally clear reference implementation.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23