# RAG System Based on MDN French Documentation: A Complete Implementation from Theory to Practice

> This article introduces a complete implementation of a Retrieval-Augmented Generation (RAG) system based on MDN French technical documentation. Through comparative experiments, it verifies the significant advantages of RAG over pure LLMs and explores the effect of fine-tuning embedding models on improving retrieval quality.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-08T00:42:37.000Z
- 最近活动: 2026-06-08T00:49:50.857Z
- 热度: 156.9
- 关键词: RAG, Retrieval-Augmented Generation, 大语言模型, 向量检索, 嵌入模型, FAISS, MDN文档, 法语NLP, Mistral, e5模型, 模型微调
- 页面链接: https://www.zingnex.cn/en/forum/thread/mdnrag
- Canonical: https://www.zingnex.cn/forum/thread/mdnrag
- Markdown 来源: floors_fallback

---

## [Introduction] RAG System Based on MDN French Documentation: A Complete Implementation from Theory to Practice

This article introduces a complete implementation of a Retrieval-Augmented Generation (RAG) system based on MDN French documentation. The core research focuses on three questions: the comparative effect of RAG vs. pure LLMs, the impact of retrieval number k, and the value of fine-tuning embedding models. Experiments verify the significant advantages of RAG over pure LLMs, and domain fine-tuning can improve the retrieval quality of embedding models. The project provides a reproducible reference implementation, which has practical implications for developers building RAG systems.

## Project Background and Core Questions

Technical documents (e.g., HTML, CSS, JS) are large in volume, precise in content, and continuously updated. Pure LLMs rely on parameterized memory to answer questions, which easily leads to inaccuracies, obsolescence, or inability to verify. RAG technology solves this problem by first retrieving relevant paragraphs and then generating answers. The core research questions of this project are: 1. Does RAG significantly improve answer quality? 2. What is the optimal value of retrieval number k? 3. Does domain fine-tuning of embedding models improve retrieval and generation effects?

## System Architecture Design

The RAG system adopts a two-stage architecture: Retriever + Generator. The retriever uses the `intfloat/multilingual-e5-base` embedding model, splits MDN French documents into paragraphs of about 800 characters, builds a vector index via FAISS, and supports query/document prefix processing. The generator uses the `unsloth/mistral-7b-instruct-v0.3` model (4-bit quantization), with generation parameters: temperature 0.3, maximum new tokens 256. Process: User question → Retrieve k relevant paragraphs → Combine prompts → Generate answers with sources.

## Data Preparation and Experiment Design

The data source is MDN French technical documentation (HTML, CSS, JS guides), and content is extracted via sparse retrieval. Preprocessing steps: Remove tags → Split into 800-character paragraphs (120-character overlap) → Filter short paragraphs, resulting in about 8943 valid paragraphs. The evaluation dataset is an automatically generated triple of question-answer-source paragraph (versioned). Experiment design: Retrieval performance is evaluated using hit@k and MRR; generation quality is compared between RAG and pure LLMs using EM, F1, and ROUGE-L; performance changes are compared after fine-tuning the embedding model (2 rounds).

## Experimental Results and Analysis

**Retrieval Performance:** The fine-tuned model outperforms the base model in metrics such as hit@1 (+8% → 0.63) and hit@3 (+9% →0.90), indicating that domain fine-tuning improves retrieval ranking quality. **Generation Quality:** The F1 score of the base model in RAG mode (0.312) is more than twice that of pure LLM (0.144), verifying the core value of RAG; fine-tuning has a mild improvement on generation quality (F1 →0.325), because the base retrieval already recalls most correct paragraphs.

## Key Technical Implementation Points and Limitations

**Implementation Points:** Mistral-7B can run on 6GB VRAM via 4-bit quantization; code is modular (configuration, retriever, etc.); versioned evaluation set ensures reproducibility. **Limitations:** EM score for generation quality is almost zero (generative models do not directly copy original text); broad questions may retrieve irrelevant paragraphs. Future directions: Introduce BERTScore or LLM to judge semantic quality; add re-ranking mechanism or relevance threshold.

## Practical Insights

This project provides a complete reference implementation for RAG system developers. Key insights: Retrieval quality determines the upper limit of generation quality, and optimizing retrieval has higher cost-effectiveness; domain fine-tuning has clear benefits for embedding models; reasonable quantization strategies can lower hardware thresholds (e.g., T4 GPU supports running).
