Reading

LLM_MVC: A Minimal Implementation of Local RAG Q&A Bot

A Retrieval-Augmented Generation (RAG) Q&A system based on local Markdown knowledge bases, supporting automatic chunking, ChromaDB vector storage, multi-file indexing, and citation-enabled answer generation.

RAGLLM向量数据库ChromaDB知识库MarkdownOpenAI文本分块语义检索Python

Published 2026-04-25 14:44Recent activity 2026-04-25 14:47Estimated read 7 min

LLM_MVC: A Minimal Implementation of Local RAG Q&A Bot

Section 01

Introduction: LLM_MVC—A Minimal Local RAG Q&A Bot Implementation

LLM_MVC is a Minimal Viable Code implementation of a local RAG Q&A system based on Markdown knowledge bases. It supports automatic chunking, ChromaDB vector storage, multi-file indexing, and citation-enabled answer generation. The project has minimal dependencies (only three core libraries: openai, chromadb, python-dotenv), with concise code. It aims to help developers understand the core working principles of RAG with an extremely low threshold while being practical enough for direct use in personal knowledge base management and Q&A.

Section 02

Project Background and Positioning

LLM_MVC is developed and maintained by Holden-Lin. The 'MVC' in the project name emphasizes the design concept of Minimal Viable Code. Its core value is to demonstrate the core principles of a RAG system with the least number of code lines while maintaining practicality. Unlike complex RAG frameworks, it requires only three core dependencies, reducing installation and maintenance costs, allowing developers to clearly track data flow links.

Section 03

Core Architecture and Vectorized Retrieval

LLM_MVC follows the standard RAG paradigm: User Query → Embedding Vectorization → ChromaDB top-k Retrieval → LLM Generates Cited Answers. The system uses OpenAI's text-embedding-3-small model (configurable for switching) to convert queries and documents into vectors, stored in local ChromaDB with persistence support. When a user asks a question, it calculates the query vector and retrieves the top-k text fragments with the closest semantic similarity.

Section 04

Detailed Explanation of Intelligent Chunking Strategy

Document chunking is a key link in RAG. LLM_MVC implements an intelligent chunking mechanism that automatically detects document structure:

Separator Type: For note-like files separated by ---, each entry is an independent chunk; if too long, split by paragraphs while retaining overlapping regions.
Heading Hierarchy Type: For standard long Markdown articles, split by # to #### levels, each chunk inherits the complete heading chain (e.g., Product Guide > Installation > Environment Requirements). Both modes support paragraph merging, 200-character overlapping window, and intelligent sentence breaking.

Section 05

Citation-Enabled Answers and Configuration Usage

Citation-Enabled Answers: The system requires the LLM to mark source numbers like [1][2] in answers; after generation, it automatically outputs a reference list (including original text fragments and file paths). Configuration and Interaction: Manage configurations (knowledge base path, chunking parameters, retrieval parameters, model configurations, etc.) via .env; after running, enter REPL mode, supporting commands like /debug (show only retrieved chunks), /reindex (rebuild index), /quit (exit).

Section 06

Index Update Mechanism and Application Scenarios

Index Update: At startup, calculate the MD5 hash of knowledge base files and compare it with the hash stored in ChromaDB; if consistent, skip indexing; if different, rebuild the index in full. Manual triggering is also possible via /reindex. Application Scenarios: Personal knowledge management, customer service knowledge bases (with URL references), RAG learning entry, rapid prototype verification.

Section 07

Technical Highlights and Insights

The technical highlights of LLM_MVC include:

Adaptive chunking strategy, adapting to diverse document formats;
Heading chain inheritance to enhance retrieval semantic matching;
Complete citation mechanism balancing fluency and traceability;
Hash check to optimize index update efficiency and save API costs.

Section 08

Conclusion: The Value of Minimal RAG

LLM_MVC proves that a practical RAG system does not require complex architecture or heavy dependencies. Through carefully designed chunking strategies, clear configuration management, and efficient index mechanisms, it provides an ideal starting point—whether for personal knowledge management or as an entry case for in-depth learning of RAG technology. Reading its source code is an intuitive way to understand the working principles of RAG.

LLM_MVC: A Minimal Implementation of Local RAG Q&A Bot

Introduction: LLM_MVC—A Minimal Local RAG Q&A Bot Implementation

Project Background and Positioning

Core Architecture and Vectorized Retrieval

Detailed Explanation of Intelligent Chunking Strategy

Citation-Enabled Answers and Configuration Usage

Index Update Mechanism and Application Scenarios

Technical Highlights and Insights

Conclusion: The Value of Minimal RAG

Continue Reading

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

LLM-assisted-analysis: A New Approach to Detecting Logical Vulnerabilities in Smart Contracts Using Large Language Models

Building Modern LLM from Scratch: A Tutorial-level Implementation of Llama-style Language Model