Zing Forum

Reading

RetinaScan: A Multimodal AI Diagnostic System for Retinal Diseases Based on EfficientNet-B4

RetinaScan is a full-stack medical web application that uses a fine-tuned EfficientNet-B4 model to classify the severity of diabetic retinopathy from fundus images. Combined with Grad-CAM interpretability and Gemini LLM clinical insights, it provides a fast and accessible AI-assisted diagnostic solution for early screening.

医疗AI糖尿病视网膜病变眼底图像EfficientNet深度学习可解释AIGrad-CAM多模态AIFastAPIPyTorch
Published 2026-06-10 02:30Recent activity 2026-06-10 02:53Estimated read 6 min
RetinaScan: A Multimodal AI Diagnostic System for Retinal Diseases Based on EfficientNet-B4
1

Section 01

Introduction: Core Overview of RetinaScan Multimodal AI Diagnostic System for Retinal Diseases

RetinaScan is a full-stack medical web application focused on AI-assisted diagnosis of diabetic retinopathy (DR). It uses a fine-tuned EfficientNet-B4 model to classify DR severity levels, integrates Grad-CAM interpretability technology and Gemini large language model to generate clinical insights, and provides a fast and accessible solution for early screening—bridging the gap between clinical imaging and AI diagnosis.

2

Section 02

Project Background: Urgent Need for Diabetic Retinopathy Screening

Diabetic retinopathy is one of the leading causes of blindness, but early detection can significantly improve prognosis. Current issues such as a shortage of ophthalmologists and cumbersome screening processes have hindered the early detection of DR. RetinaScan aims to simplify the screening process through AI technology, allowing non-professionals to operate it and improving the accessibility and efficiency of early DR screening.

3

Section 03

Technical Architecture and Core Methods

RetinaScan adopts an end-to-end full-stack architecture:

  • AI Workflow: Image upload → Preprocessing → EfficientNet-B4 inference → Grading + Confidence → Grad-CAM heatmap → Gemini clinical insights → Result return.
  • Model Details: Based on ImageNet-pre-trained EfficientNet-B4, fine-tuned on the APTOS 2019 dataset, using weighted cross-entropy to handle class imbalance, with input size 380×380.
  • Tech Stack: Front-end React + Tailwind, back-end FastAPI + PostgreSQL, AI components PyTorch + Grad-CAM + Gemini API.
  • API Design: Provides POST /predict (image diagnosis) and GET /history (history records) endpoints.
4

Section 04

Core Features: Multimodal Diagnosis and Interpretability

  1. DR Grading Diagnosis: Classifies DR into levels 0-4 (no DR to proliferative DR) and returns a confidence score.
  2. Grad-CAM Interpretability: Generates heatmaps to visualize model-focused regions, enhancing doctor trust and clinical validation.
  3. Gemini LLM Clinical Insights: Converts classification results into actionable recommendations (e.g., "Moderate DR recommends recheck in 3-6 months") to improve practical value.
5

Section 05

Application Scenarios and Value

  • Early Screening: In community health centers and telemedicine scenarios, non-professionals can quickly screen high-risk cases.
  • Clinical Assistance: Provides second opinions for ophthalmologists, improves diagnostic efficiency, and serves as a teaching tool to help medical students understand DR grading.
  • Research Support: Facilitates epidemiological surveys, model optimization, and multi-center validation.
6

Section 06

Limitations and Future Improvement Directions

Current Limitations: Relies on the APTOS 2019 dataset (limited population representativeness), supports only DR as a single disease, and image quality is affected by devices. Future Directions: Expand to multiple diseases (glaucoma, macular degeneration), integrate modalities like OCT, use federated learning to protect privacy, optimize for mobile devices, and enable real-time video analysis.

7

Section 07

Summary and Outlook: Practical Exploration of Medical AI

RetinaScan is an excellent practice of open-source medical AI, with highlights including end-to-end full-stack implementation, interpretability integration, multimodal fusion, and open-source reproducibility. It provides a clear learning path for medical AI developers and an efficient, accessible solution for DR screening. With future technological iterations, such projects will more widely promote the responsible application of AI in the medical field.