Zing Forum

Reading

MRAG-HC: A Hallucination Control System for Retrieval-Augmented Generation in Multilingual Scenarios

The M.Tech degree project at VNIT Nagpur built a multilingual RAG system supporting English, Hindi, and Marathi, integrating FAISS vector database, OCR document processing, and semantic search, and reducing hallucinations in large language models through a credibility scoring mechanism.

RAGLLMHallucination ControlMultilingual AIFAISSLangChainResponsible AIVector DatabaseNLPMachine Learning
Published 2026-06-10 09:50Recent activity 2026-06-10 09:51Estimated read 5 min
MRAG-HC: A Hallucination Control System for Retrieval-Augmented Generation in Multilingual Scenarios
1

Section 01

Introduction: MRAG-HC—A Trustworthy Retrieval-Augmented Generation System for Multilingual Scenarios

The M.Tech degree project at VNIT Nagpur developed the MRAG-HC system, which supports three languages: English, Hindi, and Marathi. It integrates FAISS vector database, OCR document processing, and semantic search technologies, and reduces hallucinations in large language models through mechanisms like credibility scoring. Its goal is to provide trustworthy multilingual AI question-answering services.

2

Section 02

Project Background and Motivation

Large Language Models (LLMs) have hallucination issues, outputting content inconsistent with facts, which affects credibility in critical domains. Most RAG systems are optimized only for English and have limited support for Indian native languages like Hindi and Marathi. This project aims to build a system that can both reduce hallucinations and serve multilingual users.

3

Section 03

System Architecture and Key Technical Components

MRAG-HC is divided into two phases: Phase 1 builds core RAG infrastructure (document ingestion, vector indexing, retrieval pipeline); Phase 2 integrates hallucination control mechanisms. Key technologies include: multilingual document processing (supports three languages, OCR extracts text from PDFs), FAISS vector database (efficient semantic search), LangChain framework (orchestrates RAG workflows), two-stage retrieval (FAISS rough recall + re-ranking for fine selection).

4

Section 04

Core Mechanisms for Hallucination Control

  1. Source-anchored generation: Forcing generated content to be based on retrieved fragments, constraining the model to only use the provided context; 2. Credibility scoring: Combining relevance between retrieved fragments and queries, consistency between generated content and fragments, and model confidence—low-confidence responses trigger warnings or manual review; 3. Fact verification layer: Cross-checking multiple retrieval sources to confirm the accuracy of key information.
5

Section 05

Practical Application Scenarios

Applicable to Indian government agencies and enterprises: Policy document query (citizens get official accurate answers in their native language), multilingual knowledge base (consistent multilingual information within enterprises), educational assistance (students query academic questions in Hindi/Marathi, and the system retrieves from English textbooks and translates).

6

Section 06

Technical Implementation Details

Developed using Python, the tech stack includes LangChain (RAG pipeline orchestration), FAISS (vector storage), Hugging Face Transformers (multilingual embeddings and LLMs), PyPDF/Tesseract (PDF processing and OCR), FastAPI (API service). The modular design facilitates expansion and maintenance.

7

Section 07

Project Significance and Summary

Significance: Reflects responsible AI (addresses credibility at the algorithm level), AI democratization (supports native languages to promote technology inclusion). Summary: MRAG-HC combines retrieval-augmented generation and hallucination control to provide trustworthy services for multilingual users, with both technical innovation and a pursuit of reliability and inclusiveness.