Reading

RetinaScan-UNet: A Deep Learning Solution for Retinal Vessel Segmentation Based on U-Net

A project from the Artificial Intelligence course at the University of Seville, using a custom U-Net architecture and 5-fold cross-validation to achieve high-precision automatic segmentation of retinal vessels on the DRIVE 2004 dataset.

U-Netretinal vessel segmentationdeep learningmedical imagingDRIVE datasetKeras计算机视觉医学影像深度学习图像分割

Published 2026-06-06 00:40Recent activity 2026-06-06 00:50Estimated read 9 min

RetinaScan-UNet: A Deep Learning Solution for Retinal Vessel Segmentation Based on U-Net

Section 01

Introduction: Core Overview of the RetinaScan-UNet Project

RetinaScan-UNet is a project from the Artificial Intelligence Software Engineering course (2025/2026 academic year) at the University of Seville, maintained by marcosayalab and released on June 5, 2026. The open-source link is https://github.com/marcosayalab/RetinaScan-UNet. This project uses a custom U-Net architecture and 5-fold cross-validation to achieve high-precision automatic segmentation of retinal vessels on the DRIVE 2004 dataset, providing a complete solution for medical image segmentation.

Section 02

Project Background and Significance

Retinal vessel segmentation is a fundamental task in computer-aided diagnosis and visual healthcare. Accurate extraction of vessel structures can provide ophthalmologists with quantitative biomarkers for monitoring and diagnosing diseases such as diabetic retinopathy (microaneurysms, neovascularization), hypertensive retinopathy (arteriolar stenosis), glaucoma (vascular changes related to optic nerve damage), and age-related macular degeneration (assessment of vascular integrity). Due to the small size of medical image datasets and high annotation costs, the project integrates techniques such as data augmentation, patch-based training, skip connection architecture, cross-validation, and post-inference processing to achieve high-quality vessel extraction with controllable computational costs.

Section 03

Dataset Introduction: DRIVE 2004 Benchmark Dataset

The project uses the DRIVE 2004 dataset (a widely used benchmark in the field of retinal vessel segmentation), which contains 40 retinal fundus images of 584×565 pixels, divided into 20 training images and 20 test images. Each image includes the original RGB fundus image, a Field of View (FoV) mask, and vessel segmentation annotations from two independent experts. The presence of the second annotator allows measuring inter-observer differences, providing a realistic upper limit reference for algorithm performance.

Section 04

Technical Architecture: Detailed Explanation of Custom U-Net

The project implements a custom U-Net architecture (encoder-decoder structure) based on the Keras Functional API:

Encoder Module: Contains two 3×3 Conv2D layers (ReLU activation, He initialization) + a 2×2 MaxPooling2D layer. Its functions are to extract semantic information, increase the receptive field, and reduce spatial dimensions.

Decoder Module: Uses UpSampling2D/Conv2DTranspose for upsampling, skip connections (Concatenate operation), and additional convolution refinement layers. Its functions are to restore spatial resolution and improve localization accuracy.

Advantages of Skip Connections: Preserve fine vessel information, obtain clear boundaries, and better reconstruct capillary structures.

Section 05

Training Strategy and Optimization Measures

Patch-Based Training

High-resolution images are divided into 128×128/256×256 patches, with dynamic padding to ensure size compatibility, and padding is removed during reconstruction. Advantages: Increase dataset size, reduce memory requirements, support larger batches, and improve training stability.

Data Augmentation

Real-time application of horizontal flip, vertical flip, random rotation, and contrast perturbation to increase data diversity with low memory usage.

5-Fold Cross-Validation

Five independent data splits, with training/validation set rotation to reduce sampling bias and improve model robustness.

Training Configuration

Optimizer: Adam; Loss function: Binary Cross-Entropy; Framework: Keras 3/TensorFlow; Callback: EarlyStopping.

Section 06

Optimization Techniques in Inference Phase

Test-Time Augmentation (TTA)

Apply transformations such as flipping and rotation to input images, then average the prediction results to improve generalization ability and stability.

Adaptive Otsu Thresholding

Binarize the model's output probability map, automatically determine the optimal threshold to adapt to different image brightness distributions.

Morphological Post-Processing

Apply opening and closing operations to remove noise points, fill small holes, and smooth vessel boundaries, improving segmentation quality.

Section 07

Project Features and Innovations

End-to-End Complete Pipeline: A full-process solution covering data preprocessing, model training, and inference optimization.
Medical Image-Specific Optimization: Deeply optimized for the special challenges of retinal vessel segmentation.
Computational Resource-Friendly: Designed to run efficiently on standard consumer-grade hardware.
Balanced Academic and Practical Value: The course project combines theoretical demonstration and practical application value.

Section 08

Summary and Insights

RetinaScan-UNet demonstrates the application of the classic U-Net in medical image segmentation. Through careful training strategies and inference optimization, it achieves high-quality vessel segmentation with limited annotated data.

Insights for the medical image AI field:

Data augmentation and cross-validation are key strategies for small dataset scenarios;
Skip connections are crucial for preserving detailed information;
TTA and morphological post-processing significantly improve segmentation quality;
A reasonable patch strategy balances computational efficiency and model performance.

This open-source project provides valuable references for researchers and developers, promoting the application of deep learning in the field of medical diagnosis.