Zing Forum

Reading

Cezzis RAG Workflow: Intelligent Retrieval and Q&A System for Cocktail Data

An in-depth analysis of a complete RAG system implementation, demonstrating how to combine Azure Cosmos DB, Qdrant vector database, and Ollama large model to build a semantic search and conversational Q&A service in the cocktail domain.

RAG检索增强生成向量数据库QdrantOllama文本嵌入语义搜索鸡尾酒数据
Published 2026-04-28 08:32Recent activity 2026-04-28 08:58Estimated read 7 min
Cezzis RAG Workflow: Intelligent Retrieval and Q&A System for Cocktail Data
1

Section 01

[Introduction] Cezzis RAG Workflow: Core Analysis of Intelligent Retrieval and Q&A System in the Cocktail Domain

Retrieval-Augmented Generation (RAG) is a mainstream paradigm for large language model application development. The Cezzis RAG workflow demonstrates an end-to-end implementation of a semantic search and conversational Q&A system in the cocktail domain. This project combines technologies such as Azure Cosmos DB (structured data storage), Qdrant (vector database), Ollama (local LLM inference), and TEI (text embedding) to address the knowledge limitations and hallucination issues of purely parametric models, making it an excellent case for understanding the collaboration of components in modern RAG architectures.

2

Section 02

Project Background and Architecture Overview

The core goal of the Cezzis RAG workflow is to allow users to query cocktail-related information in natural language and get accurate, contextually relevant answers. The system covers the entire pipeline of data ingestion, vectorization, index construction, retrieval strategy, and generative Q&A. The technology stack is typical: Python as the main language, Azure Cosmos DB for storing basic cocktail information, Qdrant for semantic indexing and approximate nearest neighbor search, Ollama for local large model inference, TEI with the intfloat/e5-base-v2 model for text embedding, and finally encapsulating the service via REST API.

3

Section 03

Data Ingestion Layer: From Cosmos DB to Processing Pipeline

The upper limit of RAG system quality depends on the quality of the knowledge base. The data source for Cezzis is Azure Cosmos DB (a fully managed NoSQL database suitable for storing semi-structured cocktail data such as recipes, ingredients, and preparation methods). Data ingestion needs to address challenges like cleaning (inconsistent formats, missing fields, duplicate records), content extraction (conversion to embeddable text), and incremental updates (full/incremental synchronization modes to minimize downtime). The project implements a configurable pipeline that supports both synchronization methods.

4

Section 04

Embedding and Indexing: Collaboration Between TEI and Qdrant

Text embedding is a core link. Cezzis uses the intfloat/e5-base-v2 model (which performs excellently in semantic similarity tasks). TEI (developed by Hugging Face) optimizes embedding inference efficiency, supporting dynamic batching, GPU acceleration, and concurrent request processing to achieve separation and scalability of embedding computation. The generated vectors are stored in Qdrant (an open-source vector database), which supports multiple distance metrics, HNSW algorithm, hybrid queries (vector similarity + structured filtering), and retains metadata such as cocktail names and categories.

5

Section 05

Retrieval Strategy and Generation Layer Implementation

Retrieval determines the quality of answers. Cezzis implements multi-layer strategies: basic layer (vector similarity search to capture semantic relevance); enhancement layer (hybrid search combining vector and BM25 keyword matching); advanced layer (re-ranking, cross-encoder re-scoring of candidate documents). The generation layer uses Ollama's local model service. Key to prompt engineering: constructing structured prompts containing system instructions, retrieval context, and user queries, which requires managing context windows, balancing relevance and diversity, and handling conflicting information.

6

Section 06

Agentic Workflow and API Serviceization

The agentic workflow goes beyond traditional RAG and introduces iterative decision-making: query refactoring (rewriting and retrying when results are unsatisfactory), multi-step retrieval (gradually deepening, e.g., first base liquor then recipe), and self-verification (evaluating answer sufficiency and requesting additional retrieval). All capabilities are exposed via REST API, following best practices: clear resource naming, consistent response formats, key endpoints (semantic search, Q&A, streaming responses), and management functions (index status query, manual re-ingestion, usage statistics).

7

Section 07

Deployment Considerations and Best Practice Insights

Flexible deployment: Qdrant and Ollama can run locally (suitable for privacy/offline scenarios), Azure Cosmos DB hosting reduces operational burden, and TEI can be deployed independently or as a sidecar. Scalability: horizontally scale TEI/Ollama instances, Qdrant distributed storage, and stateless API suitable for K8s auto-scaling. Insights: Data quality is the foundation, component decoupling facilitates optimization and replacement, progressive complexity (from basic to agentic), providing an adjustable architecture template for domain RAG systems.