Zing Forum

Reading

LangChain-based PDF RAG System: Building a Localized Intelligent Document Q&A Assistant

A complete Retrieval-Augmented Generation (RAG) system that supports automatic arXiv paper downloading, vectorized storage of PDF/Markdown documents, persistent session memory, and offers CLI interactive Q&A and chat functions.

RAGLangChainPDF问答文档检索LangGraph向量数据库Chroma
Published 2026-04-17 22:13Recent activity 2026-04-17 22:19Estimated read 7 min
LangChain-based PDF RAG System: Building a Localized Intelligent Document Q&A Assistant
1

Section 01

[Introduction] LangChain-based PDF RAG System: Localized Intelligent Document Q&A Assistant

This article introduces the open-source project langchain-pdf-rag, built on LangChain and LangGraph, which implements a complete Retrieval-Augmented Generation (RAG) system. Its core features include automatic arXiv paper downloading, vectorized storage of multi-format documents, persistent session memory, and CLI interactive Q&A and chat functions. It is particularly suitable for scenarios like academic research, providing a solution for efficiently extracting knowledge from PDF documents.

2

Section 02

Project Background: Challenges in PDF Knowledge Extraction and RAG Technical Solutions

In the era of information explosion, researchers and knowledge workers face the challenge of extracting valuable knowledge from massive PDF documents. Retrieval-Augmented Generation (RAG) technology provides an elegant solution to this problem by combining large language models with document retrieval. The langchain-pdf-rag project, built on LangChain and LangGraph, is a fully functional, clearly structured PDF Q&A system suitable for academic research scenarios.

3

Section 03

Core Features Overview: A Toolset Covering the Entire RAG Workflow

The project implements the complete workflow of a RAG system, with main features including:

  • Automatic arXiv paper collection: Batch download by topic and export metadata
  • Multi-format document support: PDF and Markdown document ingestion
  • Configurable embedding models: OpenAI cloud embedding and Hugging Face local embedding
  • Persistent session memory: SQLite-based chat history storage
  • Three interaction modes: Document ingestion, single Q&A, interactive chat
4

Section 04

Technical Architecture: Modular Three-Layer Design

The project adopts a modular design with three layers:

  1. Document Ingestion Layer: Responsible for PDF parsing, text chunking, and vectorization. Uses Chroma vector database, supports custom chunking strategies and embedding model selection (e.g., local sentence-transformers models).
  2. Retrieval Layer: Encapsulates the creation, loading, and querying of vector storage. Retrieval parameters (such as the number of returned documents RETRIEVAL_K) are configured via environment variables.
  3. Agent Layer: Builds the conversation flow based on LangGraph, enabling collaboration between retrieval tools and LLM to ensure answers are based on document content.
5

Section 05

Quick Start: From Environment Setup to Q&A Experience

Deployment steps:

  1. Environment Preparation: Create a virtual environment and install dependencies (pip install -r requirements.txt), optional local embedding dependencies.
  2. Configure API Key: Copy .env.example to .env, fill in the OpenAI API key, and select the embedding provider (openai or local).
  3. Obtain Documents: Use the script to download papers from arXiv (e.g., query RAG-related papers in the cs.AI topic).
  4. Build Knowledge Base: Execute python -m src.main ingest to build the vector index.
  5. Start Q&A: Single question (ask command) or interactive chat (chat command).
6

Section 06

Deployment Flexibility: Switching Between Cloud and Local Solutions

The project supports two deployment solutions:

  • Cloud Solution: Uses OpenAI's text-embedding-3-small model, no local GPU required, suitable for quick verification and production deployment.
  • Local Solution: Uses Hugging Face open-source embedding models, combined with local LLMs like Ollama, to achieve fully offline private knowledge base Q&A, meeting data privacy requirements. After switching models, you need to re-execute the ingest command to rebuild the vector database.
7

Section 07

Performance Optimization and Applicable Scenarios

Performance Optimization Suggestions:

  • Adjust RETRIEVAL_K to control the number of retrieved documents, balancing quality and latency;
  • Limit DOC_PREVIEW_CHARS to reduce context length;
  • Add --delay-seconds during arXiv collection to avoid rate limits. Applicable Scenarios: Academic research, technical document Q&A, report analysis, learning assistance.
8

Section 08

Summary: The Value of a Practical RAG Reference Implementation

The langchain-pdf-rag project demonstrates how to build RAG applications using modern AI toolchains. Its clear code structure, flexible configuration options, and complete example workflow provide an excellent reference for developers. Whether you want to quickly build a document Q&A system or learn best practices for LangChain and LangGraph, this project is worth studying and referencing.