Section 01
[Introduction] ClinicNumRobBench: Revealing the Vulnerability of LLMs in Clinical Numerical Reasoning
A paper accepted by ACL 2026 proposes ClinicNumRobBench, the first systematic benchmark for evaluating the robustness of large language models (LLMs) in clinical numerical reasoning. The study found that mainstream models exhibit significant vulnerability when handling numerical calculations in medical scenarios, sounding an alarm for the safe deployment of medical AI.