Zing Forum

Reading

RAG System Based on MDN French Documentation: A Complete Implementation from Theory to Practice

This article introduces a complete implementation of a Retrieval-Augmented Generation (RAG) system based on MDN French technical documentation. Through comparative experiments, it verifies the significant advantages of RAG over pure LLMs and explores the effect of fine-tuning embedding models on improving retrieval quality.

RAGRetrieval-Augmented Generation大语言模型向量检索嵌入模型FAISSMDN文档法语NLPMistrale5模型
Published 2026-06-08 08:42Recent activity 2026-06-08 08:49Estimated read 6 min
RAG System Based on MDN French Documentation: A Complete Implementation from Theory to Practice
1

Section 01

[Introduction] RAG System Based on MDN French Documentation: A Complete Implementation from Theory to Practice

This article introduces a complete implementation of a Retrieval-Augmented Generation (RAG) system based on MDN French documentation. The core research focuses on three questions: the comparative effect of RAG vs. pure LLMs, the impact of retrieval number k, and the value of fine-tuning embedding models. Experiments verify the significant advantages of RAG over pure LLMs, and domain fine-tuning can improve the retrieval quality of embedding models. The project provides a reproducible reference implementation, which has practical implications for developers building RAG systems.

2

Section 02

Project Background and Core Questions

Technical documents (e.g., HTML, CSS, JS) are large in volume, precise in content, and continuously updated. Pure LLMs rely on parameterized memory to answer questions, which easily leads to inaccuracies, obsolescence, or inability to verify. RAG technology solves this problem by first retrieving relevant paragraphs and then generating answers. The core research questions of this project are: 1. Does RAG significantly improve answer quality? 2. What is the optimal value of retrieval number k? 3. Does domain fine-tuning of embedding models improve retrieval and generation effects?

3

Section 03

System Architecture Design

The RAG system adopts a two-stage architecture: Retriever + Generator. The retriever uses the intfloat/multilingual-e5-base embedding model, splits MDN French documents into paragraphs of about 800 characters, builds a vector index via FAISS, and supports query/document prefix processing. The generator uses the unsloth/mistral-7b-instruct-v0.3 model (4-bit quantization), with generation parameters: temperature 0.3, maximum new tokens 256. Process: User question → Retrieve k relevant paragraphs → Combine prompts → Generate answers with sources.

4

Section 04

Data Preparation and Experiment Design

The data source is MDN French technical documentation (HTML, CSS, JS guides), and content is extracted via sparse retrieval. Preprocessing steps: Remove tags → Split into 800-character paragraphs (120-character overlap) → Filter short paragraphs, resulting in about 8943 valid paragraphs. The evaluation dataset is an automatically generated triple of question-answer-source paragraph (versioned). Experiment design: Retrieval performance is evaluated using hit@k and MRR; generation quality is compared between RAG and pure LLMs using EM, F1, and ROUGE-L; performance changes are compared after fine-tuning the embedding model (2 rounds).

5

Section 05

Experimental Results and Analysis

Retrieval Performance: The fine-tuned model outperforms the base model in metrics such as hit@1 (+8% → 0.63) and hit@3 (+9% →0.90), indicating that domain fine-tuning improves retrieval ranking quality. Generation Quality: The F1 score of the base model in RAG mode (0.312) is more than twice that of pure LLM (0.144), verifying the core value of RAG; fine-tuning has a mild improvement on generation quality (F1 →0.325), because the base retrieval already recalls most correct paragraphs.

6

Section 06

Key Technical Implementation Points and Limitations

Implementation Points: Mistral-7B can run on 6GB VRAM via 4-bit quantization; code is modular (configuration, retriever, etc.); versioned evaluation set ensures reproducibility. Limitations: EM score for generation quality is almost zero (generative models do not directly copy original text); broad questions may retrieve irrelevant paragraphs. Future directions: Introduce BERTScore or LLM to judge semantic quality; add re-ranking mechanism or relevance threshold.

7

Section 07

Practical Insights

This project provides a complete reference implementation for RAG system developers. Key insights: Retrieval quality determines the upper limit of generation quality, and optimizing retrieval has higher cost-effectiveness; domain fine-tuning has clear benefits for embedding models; reasonable quantization strategies can lower hardware thresholds (e.g., T4 GPU supports running).