Reading

Agentic RAG in Practice: Building an Intelligent Retrieval System Integrating Semantic Search and Lexical Ranking

This article deeply analyzes the design and implementation of a production-grade RAG system that combines agentic decision-making, vector semantic retrieval, and BM25 lexical ranking. It achieves hybrid ranking via Reciprocal Rank Fusion, providing a high-precision solution for complex multi-domain document retrieval.

RAGAgentic AI语义搜索BM25混合检索Reciprocal Rank Fusion向量数据库ClaudeVoyageAI智能体

Published 2026-04-21 00:53Recent activity 2026-04-21 01:19Estimated read 7 min

Section 01

Agentic RAG in Practice: Guide to the Intelligent Retrieval System Integrating Semantic Search and Lexical Ranking

This article introduces the design and implementation of a production-grade RAG system that integrates agentic decision-making, vector semantic retrieval, and BM25 lexical ranking. It achieves hybrid ranking through Reciprocal Rank Fusion (RRF), addressing the limitations of traditional RAG's single strategy and providing a high-precision solution for complex multi-domain document retrieval. The system's core architecture includes an intelligent decision layer, dual-path retrieval layer, and fusion ranking layer, enabling the Claude model to independently determine retrieval timing and strategies, and adapt to cross-domain query scenarios such as medicine and finance.

Section 02

Background: Limitations of Traditional RAG

In LLM applications, traditional RAG is a standard solution for addressing knowledge timeliness and hallucinations, but it has shortcomings in complex scenarios: a single retrieval strategy struggles to balance exact matching and semantic understanding; there is a lack of dynamic interaction between retrieval and generation stages; and recall rate is insufficient for cross-domain document queries. Therefore, a more flexible and integrated RAG architecture is needed.

Section 03

System Architecture: Three-Layer Intelligent Design

The system's core architecture is divided into three layers:

Intelligent Decision Layer: Driven by Claude Sonnet 4.6, it empowers the model with independent judgment capabilities (whether to retrieve, which strategy to use, multi-round query refinement) to avoid unnecessary retrieval overhead.
Dual-Path Retrieval Layer: The semantic path is based on VoyageAI's voyage-3-large embedding model (cosine/Euclidean distance matching); the lexical path uses the BM25 algorithm (keyword exact matching), with complementary advantages.
Fusion Ranking Layer: Merges results via the RRF algorithm to reconcile ranking differences between different strategies and achieve more robust sorting.

Section 04

In-Depth Analysis of Technical Implementation

Technical details include:

Vector Index and Semantic Retrieval: Custom VectorIndex class (adjustable parameters), voyage-3-large embedding model, supporting batch embedding and dimension verification; three chunking strategies (fixed length, semantic boundary, recursive character).
BM25 Lexical Retrieval: Adjustable k1 (term frequency saturation rate) and b (document length normalization) parameters, supporting custom tokenizers (adapted to Chinese, code, etc.).
RRF Mathematical Principle: Document score = harmonic mean of rankings from various strategies (formula: 1/(k+rank), k usually takes 60), no need for score normalization, strong robustness.
Agentic Query Flow: Claude analyzes the query → determines retrieval strategy → executes retrieval → evaluates results → multi-round refinement (if needed), improving the quality of answers to complex questions.

Section 05

Application Scenarios and Practical Effects

The project's test documents cover 10 domains including medicine, software engineering, and finance, simulating enterprise multi-type knowledge base scenarios. For example, the cross-domain query "Financial impact and security risks of the XDR-471 project" requires integrating multi-domain knowledge. The system's decision-making process (whether to retrieve, which path to use, result sorting, etc.) can be intuitively observed through the Streamlit interface, improving debugging transparency.

Section 06

Deployment and Expansion Recommendations

Deployment dependencies are lightweight (Python3.9+, Chroma vector database, Streamlit frontend), making it easy to deploy on a single server or workstation. Expansion directions:

Retrieval path expansion (knowledge graph structured retrieval, metadata filtering);
Agentic strategy evolution (decomposing complex problems into sub-queries for parallel retrieval);
Introducing caching mechanisms (caching results for high-frequency queries to reduce API costs).

Section 07

Summary and Outlook

This project demonstrates the key features of next-generation RAG: from passive retrieval to active decision-making, from single strategy to multi-fusion, from black-box process to transparent and observable, which has reference value for enterprise knowledge base question-answering systems. Future evolution directions include agentization, multi-modal retrieval (text + image + table), real-time learning updates, etc.

Agentic RAG in Practice: Building an Intelligent Retrieval System Integrating Semantic Search and Lexical Ranking

Agentic RAG in Practice: Guide to the Intelligent Retrieval System Integrating Semantic Search and Lexical Ranking

Background: Limitations of Traditional RAG

System Architecture: Three-Layer Intelligent Design

In-Depth Analysis of Technical Implementation

Application Scenarios and Practical Effects

Deployment and Expansion Recommendations

Summary and Outlook

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

Graph Neural Networks Revolutionize Global Weather Forecasting: From Graph Weather to Open-Source Practice of Multi-Model Fusion

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Vertica Expert Skills: A One-Stop Guide to Enterprise Database Migration and Optimization