Zing Forum

Reading

Skin Lesion CNN Classifier: A Multi-Model Deep Learning Ensemble Scheme for Clinical Deployment

An end-to-end deep learning pipeline for automated dermoscopic image classification, using a weighted ensemble of ResNet50, DenseNet121, and EfficientNet-B3, combined with test-time augmentation and class-specific threshold calibration, achieving a BACC of 0.846 on the ISIC 2018 dataset.

皮肤病变分类CNN深度学习医疗AI黑色素瘤集成学习ISIC敏感性校准
Published 2026-06-02 07:41Recent activity 2026-06-02 07:51Estimated read 7 min
Skin Lesion CNN Classifier: A Multi-Model Deep Learning Ensemble Scheme for Clinical Deployment
1

Section 01

Introduction to Skin Lesion CNN Classifier: A Multi-Model Ensemble Scheme for Clinical Deployment

Original Author/Maintainer: daorre1202 Source Platform: GitHub Original Title: skin-lesion-classifier-CNN Original Link: https://github.com/daorre1202/skin-lesion-classifier-CNN Release Time: June 1, 2026

Core Points: This project proposes an end-to-end deep learning pipeline for automated dermoscopic image classification, using a weighted ensemble of ResNet50, DenseNet121, and EfficientNet-B3, combined with Test-Time Augmentation (TTA) and class-specific threshold calibration. It achieves a Balanced Accuracy (BACC) of 0.846 on the ISIC 2018 dataset, with a focus on improving the detection sensitivity of malignant lesions such as melanoma, aiming to address the practicality issues of deep learning classifiers in clinical deployment.

2

Section 02

Clinical Background: Challenges in Melanoma Screening

Melanoma is the deadliest type of skin cancer; early detection leads to a 5-year survival rate of over 98%. However, manual dermoscopic diagnosis is subjective and relies on experience. Deep learning classifiers can assist in screening, but standard accuracy metrics are insufficient to meet clinical needs (e.g., high risk of missing malignant lesions). The core goal of this project is to significantly improve the detection rate of malignant lesions while ensuring overall accuracy through class-specific probability threshold calibration.

3

Section 03

Technical Architecture and Key Methods

Multi-Model Ensemble Strategy: Uses a weighted ensemble of three pre-trained CNNs: ResNet50, DenseNet121, and EfficientNet-B3, with weights proportional to the model's validation set BACC. Test-Time Augmentation (TTA): During inference, images are transformed (horizontal flip, vertical flip, rotation, etc.), and the average of predictions is taken to reduce variance and improve stability. Clinical Threshold Calibration: Class-specific thresholds are calibrated only on the validation set, requiring melanoma (MEL) sensitivity ≥0.85 and specificity ≥0.85; actinic keratosis (AKIEC) sensitivity ≥0.75 and specificity ≥0.70, to improve the BACC of malignant classes. Grad-CAM Visualization: Displays the regions the model focuses on, helping doctors understand the decision basis and verify whether the model pays attention to lesion features.

4

Section 04

Performance and Robustness Validation Results

Key Performance Metrics:

Metric Value
TTA Ensemble BACC (mean ± std, 3 seeds) 0.846 ± 0.009
Best Single TTA BACC 0.8607
MEL Sensitivity under Clinical Threshold Up to 0.877
BACC of Malignant Classes (MEL+BCC+AKIEC) Up to 0.839

Robustness Validation: Tested on 3 random seeds (42,7,123) and platforms like Google Colab T4 and Kaggle P100; the BACC standard deviation is only ±0.009, showing stable generalization.

5

Section 05

Core Considerations for Clinical Deployment

Sensitivity-First Design: In clinical deployment, the cost of false negatives (missing malignant lesions) is far higher than false positives; the threshold calibration strategy is optimized based on this reality. Interpretability: Grad-CAM visualization allows doctors to verify the rationality of model decisions, enhancing trust. Platform Independence: Robustness verified across multiple cloud platforms, adapting to different hardware environments in hospitals.

6

Section 06

Project Significance, Limitations, and Future Directions

Project Significance:

  • Demonstrates the path of translating deep learning research into clinical tools, with optimization guided by clinical needs (sensitivity-first).
  • Reflects on model evaluation metrics in medical scenarios; BACC and class-specific sensitivity/specificity are more clinically valuable.
  • Opensource complete code, pre-trained models, and documentation, lowering the barrier to entry in the medical AI field.

Limitations: Limited dataset size (few samples for some classes), single data source (only ISIC), multi-class results need further aggregation into clinical binary decisions.

Future Directions: Multi-modal fusion (combining clinical metadata), active learning (collecting hard cases for iterative improvement), edge deployment (optimizing models to run on mobile/edge devices).