Section 01
[Introduction] New Evaluation Framework for Medical Large Language Models: Retrieval-Augmented Six-Dimensional System
This article presents the open-source medical LLM evaluation framework LLMs-Healthcare-Evaluation, whose core concept is 'retrieval-augmented evaluation'. By comparing with authoritative biomedical literature, it comprehensively assesses model performance across six dimensions: correctness, hallucination resistance, completeness, faithfulness, evidence-basedness, and empathy. It addresses the limitations of traditional evaluation methods that rely on single metrics or laboratory settings, providing support for the selection, optimization, and regulation of medical AI.