Zing Forum

Reading

Hybrid RAG: An End-to-End Retrieval-Augmented Generation Solution Combining Keyword and Semantic Search

A complete RAG pipeline implementation that combines dense vector retrieval and sparse keyword search, integrating Cross-Encoder re-ranking, local LLM inference, RAGAS evaluation, and LangSmith observability

RAG混合检索稠密向量搜索稀疏关键词搜索Cross-EncoderLLM推理RAGAS评估LangSmith
Published 2026-06-16 01:16Recent activity 2026-06-16 01:22Estimated read 6 min
Hybrid RAG: An End-to-End Retrieval-Augmented Generation Solution Combining Keyword and Semantic Search
1

Section 01

Hybrid RAG: Introduction to the End-to-End Retrieval-Augmented Generation Solution Combining Keyword and Semantic Search

Project Basic Information

  • Original Author/Maintainer: DEVANSHU-KALI
  • Source Platform: GitHub
  • Original Link: https://github.com/DEVANSHU-KALI/Hybrid_RAG-Combining-keyword-and-semantic-search
  • Core Solution: This project provides a production-ready end-to-end RAG pipeline that combines dense vector retrieval and sparse keyword search, integrating Cross-Encoder re-ranking, local LLM inference, RAGAS evaluation, and LangSmith observability to address the limitations of traditional RAG systems in exact matching scenarios.
2

Section 02

Evolution of Retrieval-Augmented Generation and Background of Hybrid Retrieval

Retrieval-Augmented Generation (RAG) is a mainstream solution to address LLM hallucinations and knowledge timeliness issues. However, traditional RAG relies on pure semantic vector search, which performs poorly in scenarios requiring exact matching of proper nouns, product models, code identifiers, etc. Hybrid retrieval technology bridges this gap by combining the depth of semantic understanding with the precision of keyword matching, improving retrieval quality across a wide range of query scenarios.

3

Section 03

Project Architecture and Detailed Explanation of Hybrid Retrieval Mechanism

Core Architecture Components

  • Hybrid Retrieval Layer: Performs both dense vector retrieval and sparse keyword search simultaneously
  • Intelligent Re-ranking: Cross-Encoder model refines the initial results
  • Local LLM Inference: Supports private deployment
  • Quality Evaluation: RAGAS framework
  • Observability: LangSmith tracking and monitoring

Hybrid Retrieval Mechanism

  • Dense Vector Retrieval: Uses embedding models like sentence-transformers to generate vectors, calculates semantic relevance, and excels at concept-related queries
  • Sparse Keyword Search: Based on inverted index/BM25 algorithm, enables exact matching of specific identifiers and technical terms
  • Result Fusion Strategy: Adopts reciprocal rank fusion (RRF), weighted linear combination, or cascaded filtering to balance recall and precision
4

Section 04

Cross-Encoder Re-ranking and Advantages of Local LLM Inference

Cross-Encoder Re-ranking

The initial retrieval yields many candidate documents. Cross-Encoder concatenates the query and documents and feeds them into the model, outputting fine-grained relevance scores. This reduces the candidate set to the most relevant documents and improves generation quality (better at capturing complex interactions compared to Bi-Encoder).

Local LLM Inference

Supports local deployment, ensuring sensitive data does not leave the local environment to meet compliance requirements; eliminates external API dependencies, reducing costs and network latency.

5

Section 05

RAGAS Evaluation and LangSmith Observability

RAGAS Evaluation Framework

Provides multi-dimensional automated evaluation:

  • Context Relevance: Matching degree between retrieved documents and query
  • Faithfulness: Whether generated content is based on retrieved documents (no hallucinations)
  • Answer Relevance: Whether generated content directly answers the query
  • Context Recall: Whether retrieved documents contain all required information

LangSmith Observability

  • Request Tracking: Complete recording of processing flow
  • Latency Analysis: Identifying performance bottlenecks
  • Retrieval Visualization: Viewing documents and scores
  • Debugging Support: Locating retrieval/generation issues
6

Section 06

Practical Significance and Deployment Recommendations

Practical Significance

This project has a complete tech stack and is an ideal starting point for building enterprise-level RAG systems: hybrid retrieval covers a wide range of queries, Cross-Encoder improves quality, local LLM ensures privacy, and RAGAS and LangSmith support continuous optimization.

Deployment Recommendations

  • Adjust the weights of dense and sparse retrieval
  • Fine-tune embedding models and re-ranking models for domain-specific data
  • Establish a continuous evaluation feedback loop