Zing Forum

Reading

Multimodal Deep Learning in Cardiac SPECT Imaging: A Myocardial Perfusion Classification System Fusing Imaging and Clinical Data

This article introduces an innovative multimodal deep learning system that combines cardiac SPECT imaging and clinical tabular data to classify myocardial perfusion status. By fusing CNN-based image processing and ANN-based tabular data analysis, the system achieved a classification accuracy of 97.44% on 97 cases, significantly outperforming single-modal methods.

多模态深度学习医学影像AISPECT心肌灌注成像CNNANN迁移学习心血管AI临床数据融合
Published 2026-04-12 04:12Recent activity 2026-04-12 04:22Estimated read 6 min
Multimodal Deep Learning in Cardiac SPECT Imaging: A Myocardial Perfusion Classification System Fusing Imaging and Clinical Data
1

Section 01

Introduction: Innovative Application of Multimodal Deep Learning in Cardiac SPECT Imaging

This article introduces an innovative multimodal deep learning system that combines cardiac SPECT imaging and clinical tabular data to classify myocardial perfusion status. By fusing CNN-based image processing and ANN-based tabular data analysis, the system achieved a classification accuracy of 97.44% on 97 cases, significantly outperforming single-modal methods.

2

Section 02

Background: The Information Silo Problem in Medical Imaging AI

In cardiovascular disease diagnosis, SPECT myocardial perfusion imaging is a commonly used non-invasive method. However, traditional image-based deep learning models ignore patients' clinical information (medical history, laboratory indicators, etc.), leading to information silos that limit model accuracy and robustness. The open-source project multimodal-cardiac-mpi-classification addresses this issue through multimodal fusion technology.

3

Section 03

Methods: Dual-Branch Architecture and Data Processing Pipeline

Technical Architecture

The system adopts a late fusion strategy with dual branches processing different modalities:

  • CNN Branch: Input 9-channel SPECT images, reduce dimensionality via 1×1 convolution, extract features using a fine-tuned ResNet50
  • ANN Branch: Input 20 clinical features selected by ANOVA, process via fully connected network after encoding and standardization
  • Fusion Layer: Weighted late fusion or meta-learning stacking

Data Processing Pipeline

  • Imaging: Raw images → Cropping → Augmentation → Channel stacking → Feature extraction
  • Clinical Data: Indicator extraction → Cleaning → Feature selection (45→20) → ANN processing
4

Section 04

Evidence: Performance Advantages of the Fusion Model

Validation results on a dataset of 97 cases (with class imbalance):

Model Type Accuracy
CNN (only images) 96.15%
ANN (only tabular) 91.03%
Fusion Model 97.44%

Key Findings:

  1. Synergistic effect: Detects infarct cases missed by single modalities
  2. Complementarity: Imaging excels at structural abnormalities; clinical data integrates contextual information
  3. Robustness: Maintains performance when one modality is missing
5

Section 05

Technical Highlights: Innovative Practices in Transfer Learning and Feature Engineering

  1. Transfer Learning: Selectively unfreeze ResNet50, retain general features while adapting to medical images
  2. Imaging Tensor Construction: Multi-channel stacking + 1×1 convolution dimensionality reduction to adapt to CNN
  3. Small Dataset Generalization: Regularization methods like data augmentation, Dropout, and early stopping
  4. Feature Selection: ANOVA statistical test to select 20 key clinical features
6

Section 06

Limitations and Future Directions: Dataset and Technical Improvement Paths

Limitations

  • Small dataset size (97 cases), generalization to be verified
  • Original data not publicly available, affecting reproducibility
  • Class imbalance impacts minority class recognition

Future Directions

  • Introduce attention mechanisms to align image regions with clinical features
  • Explore Transformer applications in multimodal medical data
  • Build large-scale public benchmark datasets
7

Section 07

Insights and Conclusion: Potential of Multimodal Medical AI

Insights

  • Multimodal fusion is key to improving model performance
  • Deep integration of clinical knowledge (from feature selection to model design)
  • Medical AI requires strict engineering standards (data processing, validation, etc.)

Conclusion

This project demonstrates the potential of multimodal deep learning in cardiovascular medicine. Fusing imaging and clinical data improves accuracy and identifies cases difficult to recognize by single modalities. We look forward to multimodal technology playing a greater role in precision medicine.