Zing Forum

Reading

Two-way-RAG: Building a Local Voice-Interactive Document Knowledge Base System

Explore the Two-way-RAG project, a voice-interactive Retrieval-Augmented Generation (RAG) system based on FastAPI, LangChain, and Ollama. This article details how to convert local documents into a conversational knowledge base, enabling a fully private AI Q&A experience.

RAGLLMOllamaFastAPILangChainFAISS语音交互本地部署知识库语义检索
Published 2026-04-13 14:45Recent activity 2026-04-13 14:49Estimated read 6 min
Two-way-RAG: Building a Local Voice-Interactive Document Knowledge Base System
1

Section 01

Two-way-RAG: Local Voice-Interactive Document Knowledge Base System (Main Guide)

Two-way-RAG is a local voice-interactive Retrieval-Augmented Generation (RAG) system built with FastAPI, LangChain, and Ollama. Its core features include:

  1. Privacy-first design: All data stays local (uses Ollama for local Llama3.2 model and FAISS for semantic search).
  2. Voice interaction: Supports voice input (via Web Speech API) and voice output (via gTTS).
  3. Private document Q&A: Enables natural dialogue with personal documents without data leakage. This system addresses the need for secure, local AI-powered document knowledge bases.
2

Section 02

Project Background & Core Concept

Project Background

In the era of LLM development, how to let AI understand private documents while ensuring data privacy is a key concern. Two-way-RAG solves this by providing a fully local deployable voice-interactive RAG system.

Core Concept

  • Localization first: Uses Ollama to run Llama3.2 locally, FAISS for efficient semantic retrieval (all data remains on user's machine).
  • Voice interaction: Allows voice input and output for natural dialogue experience.
3

Section 03

Technical Architecture Deep Dive

Technical Stack

  • Backend: FastAPI (high-performance async web framework).
  • RAG Pipeline: LangChain (componentized design for document loading, text splitting, embedding, retrieval).
  • Vector Storage: FAISS (for fast similarity search).
  • Embedding Model: all-MiniLM-L6-v2 (lightweight yet effective sentence embedding).
  • Voice Interaction: Web Speech API (STT) and gTTS (TTS) for bidirectional voice support.
4

Section 04

Core Functions & Usage Scenarios

Core Functions

  1. Flexible document access: Preload from pre_trained_data directory or upload PDF/TXT dynamically.
  2. Smart dialogue handling: Direct LLM response for greetings/chats; RAG for knowledge queries.
  3. Session history: Auto-saved via LocalStorage.
  4. Reinitialize: One-click rebuild of knowledge base after updating pre-trained docs.

Usage Scenarios

Applicable for batch initialization and incremental updates of knowledge bases.

5

Section 05

Deployment & Operation Guide

Prerequisites

  • Python 3.9+
  • Ollama installed with llama3.2:latest model.

Installation Steps

  1. Clone the repository.
  2. Create and activate a virtual environment.
  3. Install dependencies (FastAPI, LangChain, FAISS, Sentence Transformers, gTTS, etc.).

Startup

Run uvicorn main:app --reload; access via http://localhost:8000 (responsive interface with chat history sidebar).

6

Section 06

RAG Pipeline Working Principle

Document Processing Phase

When starting or uploading docs: Split text into chunks → generate embeddings via all-MiniLM-L6-v2 → store in FAISS index.

Query Processing Phase

  • Intent recognition: For knowledge queries, convert to vector → search FAISS for similar chunks.

Answer Generation Phase

Combine retrieved chunks with query into prompt → send to local Llama3.2 → generate answer (all local, fast response).

7

Section 07

Application Scenarios & Practical Value

Application Scenarios

  1. Researchers: Personal literature assistant for quick retrieval/summary of academic papers.
  2. Enterprises: Internal knowledge base for employees to query company docs/manuals.
  3. Developers: Learning example for RAG tech stack (clear code structure, detailed comments).

Practical Value

Represents the trend from general models to specialized systems → more accurate answers in specific domains, reduces hallucinations.

8

Section 08

Summary & Outlook

Summary

Two-way-RAG is a well-designed open-source RAG system combining voice interaction, local LLM inference, and semantic retrieval. It provides a practical private knowledge base solution with high code quality and comprehensive documentation.

Outlook

As local LLM performance and vector DB technology evolve, such systems will become more powerful and user-friendly. Ideal for users focusing on data privacy and local AI usage.