Section 01
MELMA-Q: Introduction to the Clinical-Grade Framework for Safety Assessment of Medical LLM Answers
MELMA-Q is a safety assessment framework for answers generated by medical large language models (LLMs). It includes a 30-item clinician rating questionnaire covering seven dimensions: accuracy, reasoning ability, safety, clarity, comprehensibility, practicality, and response behavior. Its purpose is to fill the gap where traditional automatic evaluation metrics fail to capture the safety dimensions of medical responses.