Zing Forum

Reading

DAFT: Building a Medical Report Interpretation System with 1.1B-Parameter TinyLLaMA, Hallucination Rate Only 2.1%

The DAFT project demonstrates how domain-adaptive fine-tuning and hybrid architecture design enable small models to outperform large model baselines in medical scenarios, achieving a production-grade medical AI application with 97.9% accuracy and only 2.1% hallucination rate.

TinyLLaMALoRA医疗AI血液检测模型微调幻觉率轻量级模型领域自适应健康科技开源医疗
Published 2026-05-13 07:41Recent activity 2026-05-13 07:47Estimated read 5 min
DAFT: Building a Medical Report Interpretation System with 1.1B-Parameter TinyLLaMA, Hallucination Rate Only 2.1%
1

Section 01

DAFT Project Guide: Building a Low-Hallucination Medical Report Interpretation System with 1.1B-Parameter TinyLLaMA

The medical AI field faces a dilemma: large models have high deployment costs and high hallucination rates, while small models lack professional capabilities. The DAFT project uses domain-adaptive fine-tuning and hybrid architecture design, leveraging the 1.1B-parameter TinyLLaMA model to achieve 97.9% accuracy and 2.1% hallucination rate in blood test report interpretation tasks, significantly outperforming medical domain baseline models like BioBERT and ClinicalBERT, providing a feasible solution for lightweight medical AI applications.

2

Section 02

Project Background: Readability Crisis of Medical Reports and Limitations of Existing Solutions

Blood test reports are important basis for medical diagnosis, but over 60% of patients cannot understand them accurately, leading to anxiety and increased communication costs between doctors and patients. Traditional manual interpretation is inefficient, and general-purpose large models have hallucination risks due to the extremely high accuracy requirements of medical scenarios, making them difficult to deploy directly.

3

Section 03

Core Innovation: Hybrid Architecture Design Balances Accuracy and Fluency

DAFT adopts a hybrid architecture: the deterministic component achieves 100% accurate numerical extraction through regular expressions and rule engines; the generative component converts structured data into patient-friendly explanations based on the fine-tuned TinyLLaMA model. The layered design ensures both medical rigor and humanized expression.

4

Section 04

Technical Implementation: LoRA Fine-Tuning Enables Small Models to Have Medical Professional Capabilities

The 1.1B-parameter TinyLLaMA model is selected, and LoRA fine-tuning (r=16, α=32) is used to inject domain capabilities. The training data consists of 850 manually labeled samples (split 8:1:1), cross-validated by 3 medical experts (κ=0.83). Performance saturates when the number of samples exceeds 500, showing high data efficiency.

5

Section 05

End-to-End System: Complete Process from PDF to Friendly Report

The system supports PDF/image uploads. The process includes OCR recognition, numerical parsing, anomaly detection, intelligent interpretation, and result presentation, taking approximately 2.3 seconds in total. The tech stack includes React+TypeScript for the frontend, FastAPI for the backend, and the model is deployed on Hugging Face Spaces.

6

Section 06

Performance Verification: Impressive Results Surpassing Medical Large Model Baselines

Evaluated by 5 medical experts via a triple-blind protocol, DAFT has a hallucination rate of 2.1% and an accuracy rate of 97.9%, significantly outperforming BioBERT (9.4% hallucination rate), ClinicalBERT (11.8%), and BioGPT (7.1%). In robustness tests, the accuracy across lab formats ranges from 87.5% to 100%, and it maintains 94.7% accuracy even with 5% OCR errors.

7

Section 07

Clinical Significance and Ethical Considerations: Positioned as an Educational Auxiliary Tool

DAFT can be trained and deployed on consumer-grade hardware (12GB VRAM GPU), promoting the inclusiveness of medical AI. The system clearly prompts that it cannot replace professional medical advice, reflecting ethical responsibility, and is positioned as an educational auxiliary tool to improve patients' health literacy.

8

Section 08

Open Source and Future Outlook: Promoting the Democratization of Medical AI

DAFT's open-source code includes training scripts, model weights, and deployment guidelines, and the related paper was published at the ICCET international conference. In the future, it can be extended to more types of medical reports, support multiple languages, and integrate personalized recommendations, providing a model for small model applications in vertical domains.