Reading

STARAPTOR: Multi-center Renal Pathology Image Data Harmonization and Transplant Prognosis Prediction

This study introduces a harmonization research on multi-center renal pathology image data. By comparing six data harmonization methods, it addresses batch effects caused by differences in scanners and staining protocols across institutions, significantly improving the predictive accuracy of machine learning models for kidney transplant prognosis.

数据协调多中心研究肾脏病理机器学习ComBat批次效应肾移植医疗AI

Published 2026-05-28 09:15Recent activity 2026-05-28 09:20Estimated read 8 min

STARAPTOR: Multi-center Renal Pathology Image Data Harmonization and Transplant Prognosis Prediction

Section 01

STARAPTOR Project Introduction: Multi-center Renal Pathology Data Harmonization Improves Transplant Prognosis Prediction

The STARAPTOR project addresses the batch effect issue in multi-center renal pathology image data (systematic bias caused by differences in scanners and staining protocols across institutions). It systematically compares six data harmonization methods and finds that the ComBat method performs best, significantly improving the predictive accuracy of machine learning models for kidney transplant prognosis (eGFR, DGF), providing a methodological template for multi-center medical AI research.

Section 02

Batch Effect Challenges in Multi-center Medical Research

Single-institution datasets have limited sample sizes, so multi-center collaboration is an inevitable choice. However, technical differences between hospitals (scanners, tissue processing, staining protocols) introduce batch effects that mask real biological signals. Renal pathology is particularly sensitive: donor biopsy WSI requires precise feature quantification to predict transplant prognosis, but directly mixing data from UC Davis, University of Coimbra, and Mayo Clinic for training would make the model learn institutional artifacts rather than pathological patterns. The STARAPTOR project evaluates six harmonization methods for this purpose.

Section 03

Study Design and Comparison of Harmonization Methods

Data Sources and Prediction Objectives

Data: Donor kidney biopsy WSI radiomic features from UC Davis, University of Coimbra, and Mayo Clinic (165 matched features)
Prediction endpoints: 12-month post-transplant eGFR (regression), DGF (classification)

Six Harmonization Methods

Method	Principle	Applicable Scenario
Unharmonized	Raw data without harmonization	Baseline control
Z-Score	Feature standardization (zero mean, unit variance)	Simple linear offset correction
RAVEL	Linear adjustment based on reference variables	Known batch-related variables
CORAL	Correlation alignment (second-order statistic matching)	Differences in feature covariance structure
CovBat	Covariate-adaptive batch correction	Complex nonlinear batch effects
ComBat	Empirical Bayesian batch correction	Classic batch effect removal

Section 04

Experimental Results: Harmonization Methods Significantly Improve Predictive Performance

Aggregated Data Experiment

eGFR prediction: XGBoost+ComBat (MSE 239) reduced MSE by 32.3% compared to unharmonized (353)
DGF prediction: XGBoost+ComBat (AUC 0.961) increased AUC by 37.5% compared to unharmonized (0.699)

LOO Cross-Validation (Generalization Test)

eGFR: XGBoost+LOO ComBat (MSE372) reduced MSE by 25.5% compared to unharmonized (499)
DGF: XGBoost+LOO ComBat (AUC0.829) increased AUC by37.0% compared to unharmonized (0.605)

Key findings: ComBat/CovBat are the most stable; XGBoost benefits the most; harmonization must be applied during inference (Harm→Raw performs worse)

Section 05

Technical Implementation and Pipeline Workflow

Reproducible Pipeline Steps

Step	Script	Function
1	01_preprocess_data.py	Load data, aggregate subjects, calculate outcomes
2	02_prepare_features.py	Match features, align naming, impute missing values
2.5	02.5_alt_harm_methods.py	Z-Score/CORAL and other harmonization
2.5	02.5_harmonize.Rmd	ComBat/CovBat harmonization (R package)
3	03_loo_combat.py	LOO ComBat + model training
3	03_train_models.py	Full scenario training
3.5	03.5_mrmr_feature_selection.py	Feature selection optimization
4	04_process_results.py	Result aggregation
5	05_visualize.py	Generate charts

Environment Configuration

Python: Preprocessing, ML modeling
R: ComBat/CovBat (ComBatFamQC package)
config.py required to specify data paths

Section 06

Methodological Insights and Clinical Significance

Reasons for ComBat's Optimal Performance

Empirical Bayesian framework handles small sample batches
Preserves biological signals
Parameters are transferable (LOO validation)

Implications for Clinical AI Deployment

Data harmonization is essential for multi-center research
Method selection must match data characteristics
LOO validation more strictly evaluates generalization ability
Harmonization parameters can be transferred to new institutions

Section 07

Limitations and Future Directions

Limitations

Limited sample size
Dependence on predefined radiomic features
Does not cover emerging deep learning harmonization methods

Future Directions

Integrate deep learning features with ComBat harmonization
Develop harmonization methods specific to pathology images
Establish standardized multi-center data collection protocols
Explore applicability to other organ transplants

Section 08

Project Summary: Data Harmonization is Key to Multi-center Medical AI

The STARAPTOR project demonstrates the core role of data harmonization in multi-center medical AI: the ComBat method is optimal, significantly improving the accuracy of kidney transplant prognosis prediction, and maintaining advantages even in cross-center generalization. This study provides a methodological template for multi-center imaging AI, and data harmonization will be a key link in ensuring model reliability and fairness in clinical deployment.