Zing Forum

Reading

Agentic RAG in Practice: Building an Intelligent Retrieval System Integrating Semantic Search and Lexical Ranking

This article deeply analyzes the design and implementation of a production-grade RAG system that combines agentic decision-making, vector semantic retrieval, and BM25 lexical ranking. It achieves hybrid ranking via Reciprocal Rank Fusion, providing a high-precision solution for complex multi-domain document retrieval.

RAGAgentic AI语义搜索BM25混合检索Reciprocal Rank Fusion向量数据库ClaudeVoyageAI智能体
Published 2026-04-21 00:53Recent activity 2026-04-21 01:19Estimated read 7 min
Agentic RAG in Practice: Building an Intelligent Retrieval System Integrating Semantic Search and Lexical Ranking
1

Section 01

Agentic RAG in Practice: Guide to the Intelligent Retrieval System Integrating Semantic Search and Lexical Ranking

This article introduces the design and implementation of a production-grade RAG system that integrates agentic decision-making, vector semantic retrieval, and BM25 lexical ranking. It achieves hybrid ranking through Reciprocal Rank Fusion (RRF), addressing the limitations of traditional RAG's single strategy and providing a high-precision solution for complex multi-domain document retrieval. The system's core architecture includes an intelligent decision layer, dual-path retrieval layer, and fusion ranking layer, enabling the Claude model to independently determine retrieval timing and strategies, and adapt to cross-domain query scenarios such as medicine and finance.

2

Section 02

Background: Limitations of Traditional RAG

In LLM applications, traditional RAG is a standard solution for addressing knowledge timeliness and hallucinations, but it has shortcomings in complex scenarios: a single retrieval strategy struggles to balance exact matching and semantic understanding; there is a lack of dynamic interaction between retrieval and generation stages; and recall rate is insufficient for cross-domain document queries. Therefore, a more flexible and integrated RAG architecture is needed.

3

Section 03

System Architecture: Three-Layer Intelligent Design

The system's core architecture is divided into three layers:

  1. Intelligent Decision Layer: Driven by Claude Sonnet 4.6, it empowers the model with independent judgment capabilities (whether to retrieve, which strategy to use, multi-round query refinement) to avoid unnecessary retrieval overhead.
  2. Dual-Path Retrieval Layer: The semantic path is based on VoyageAI's voyage-3-large embedding model (cosine/Euclidean distance matching); the lexical path uses the BM25 algorithm (keyword exact matching), with complementary advantages.
  3. Fusion Ranking Layer: Merges results via the RRF algorithm to reconcile ranking differences between different strategies and achieve more robust sorting.
4

Section 04

In-Depth Analysis of Technical Implementation

Technical details include:

  • Vector Index and Semantic Retrieval: Custom VectorIndex class (adjustable parameters), voyage-3-large embedding model, supporting batch embedding and dimension verification; three chunking strategies (fixed length, semantic boundary, recursive character).
  • BM25 Lexical Retrieval: Adjustable k1 (term frequency saturation rate) and b (document length normalization) parameters, supporting custom tokenizers (adapted to Chinese, code, etc.).
  • RRF Mathematical Principle: Document score = harmonic mean of rankings from various strategies (formula: 1/(k+rank), k usually takes 60), no need for score normalization, strong robustness.
  • Agentic Query Flow: Claude analyzes the query → determines retrieval strategy → executes retrieval → evaluates results → multi-round refinement (if needed), improving the quality of answers to complex questions.
5

Section 05

Application Scenarios and Practical Effects

The project's test documents cover 10 domains including medicine, software engineering, and finance, simulating enterprise multi-type knowledge base scenarios. For example, the cross-domain query "Financial impact and security risks of the XDR-471 project" requires integrating multi-domain knowledge. The system's decision-making process (whether to retrieve, which path to use, result sorting, etc.) can be intuitively observed through the Streamlit interface, improving debugging transparency.

6

Section 06

Deployment and Expansion Recommendations

Deployment dependencies are lightweight (Python3.9+, Chroma vector database, Streamlit frontend), making it easy to deploy on a single server or workstation. Expansion directions:

  1. Retrieval path expansion (knowledge graph structured retrieval, metadata filtering);
  2. Agentic strategy evolution (decomposing complex problems into sub-queries for parallel retrieval);
  3. Introducing caching mechanisms (caching results for high-frequency queries to reduce API costs).
7

Section 07

Summary and Outlook

This project demonstrates the key features of next-generation RAG: from passive retrieval to active decision-making, from single strategy to multi-fusion, from black-box process to transparent and observable, which has reference value for enterprise knowledge base question-answering systems. Future evolution directions include agentization, multi-modal retrieval (text + image + table), real-time learning updates, etc.