Section 01
[Introduction] Rveda: A Rigorous Evaluation Benchmark for AI Medical Coding Agents
Rveda is a benchmark environment for evaluating AI medical coding agents. Its core goal is to test whether large language model agents can accurately complete ICD-10 coding through retrieval and verification processes in human-machine collaboration scenarios, instead of directly generating potentially hallucinatory labels. It focuses on evidence-based clinical reasoning capabilities rather than mere label recall, aiming to address the hallucination or over-aggressiveness issues of AI models in medical coding caused by the pursuit of surface accuracy.