Section 01
【FairMedQA Research Guide】Benchmark Dataset and Key Findings for Evaluating Medical AI Fairness
This article introduces FairMedQA—an open-source benchmark dataset for evaluating the fairness of large language models (LLMs) in medical question-answering tasks. Through counterfactual samples and adversarial testing, this study reveals bias issues in current medical AI systems across dimensions such as race, gender, and socioeconomic status, providing standardized tools and empirical evidence for building more fair medical AI.