Section 01
Introduction to ScanFormer: A Multimodal Medical Image Report Generation Model Fine-Tuned with LoRA
ScanFormer is an independent research project developed by Divya Rahul Shah, an undergraduate at the Indian Institute of Technology Gandhinagar (IIT Gandhinagar). It aims to integrate modern multimodal large language model technology with parameter-efficient fine-tuning methods to build a practical medical image report generation system. Based on the LLaVA-Med vision-language architecture, the model uses LoRA efficient fine-tuning technology (training only about 2% of parameters) and EWC technology to prevent catastrophic forgetting. Trained on the CheXpert dataset (224,316 chest X-ray images), it achieves automated radiology report generation. Key achievements include: report quality BLEU-4 score of 38.4, clinical factuality of 89.7%, general language ability retention of 96.2%, and hallucination rate as low as 4.1%.