Section 01
[Introduction] LLM-CAT: Efficient Evaluation of Large Models' Medical Capabilities Using Computerized Adaptive Testing
The LLM-CAT project innovatively applies Computerized Adaptive Testing (CAT) technology to the field of medical benchmark evaluation for large language models. Its core goal is to maintain accurate assessment of the model's medical knowledge level while significantly reducing the number of evaluation questions, addressing the bottleneck of high computing and time costs in traditional fixed testing modes.