System Architecture Overview
The system adopts a modular, layered design that decomposes the RAG process into multiple independently optimizable components. The overall architecture includes:
- Document Parsing Layer: Uses Docling for high-quality PDF parsing
- Index Layer: Builds hybrid vector indexes, supports parent document retrieval
- Retrieval Layer: Implements hybrid search strategies
- Re-ranking Layer: Uses LLM for intelligent re-ranking
- Generation Layer: Supports multi-model integration and chain-of-thought reasoning
- Query Routing Layer: Intelligently routes complex queries
Core Technology Details
1. Custom PDF Parsing and Docling Integration
The system uses Docling as the PDF parsing engine, which has advantages in layout understanding, semantic preservation, and metadata extraction, and adds custom strategies such as intelligent chunking, context association, table processing, and image description.
2. Hybrid Vector Search
Combines dense vector retrieval (semantic understanding, fuzzy matching) and sparse vector retrieval (BM25 keyword matching), and improves retrieval effectiveness through mechanisms like dynamic weighting and RRF fusion.
3. Parent Document Retrieval
Adopts a two-stage retrieval strategy: sub-chunk retrieval → parent document acquisition → context expansion to solve the semantic fragmentation problem caused by chunking.
4. Intelligent LLM Re-ranking
Improves result quality through candidate pool construction → LLM relevance scoring → re-ranking → Top-N selection.
5. Multi-model Integration
Supports OpenAI GPT, Google Gemini, and other models, and enables intelligent selection through task routing, cost optimization, and fallback mechanisms.
6. Chain-of-Thought Reasoning
Automatically identifies complex queries and generates step-by-step reasoning processes to improve answer quality and interpretability.
7. Query Routing
Automatically classifies query types (simple Q&A, comparison queries, etc.) and selects corresponding processing flows.