Section 01
【Introduction】RAG Practice of Open-source LLMs in Biomedical Data Retrieval: A Multi-model Comparative Study
This study builds a RAG system for microbiome sample data, compares the retrieval-augmented generation capabilities of four language models—GPT (closed-source), Llama, OLMo, and Pythia—and conducts multi-dimensional evaluation using the RAGAS framework. It aims to address the complexity of data querying in the biomedical field, explore the application potential of open-source LLMs in professional scenarios, and provide reusable RAG templates and evaluation methodologies for the field.