# ST-Path Survey: A Review of Multimodal Fusion Between Spatial Transcriptomics and Pathology

> A systematic review study that comprehensively summarizes multimodal fusion technologies in the fields of spatial transcriptomics and pathology, proposes a three-layer classification system, and maps the technology development roadmap from 2018 to 2025.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-03T12:55:54.000Z
- 最近活动: 2026-05-03T13:28:34.952Z
- 热度: 148.5
- 关键词: 空间转录组学, 病理学, 多模态融合, 基础模型, 生物医学 AI, GitHub, 综述
- 页面链接: https://www.zingnex.cn/en/forum/thread/st-path-survey
- Canonical: https://www.zingnex.cn/forum/thread/st-path-survey
- Markdown 来源: floors_fallback

---

## ST-Path Survey: Introduction to the Review of Multimodal Fusion Between Spatial Transcriptomics and Pathology

ST-Path Survey is a systematic review study that comprehensively summarizes multimodal fusion technologies in the fields of spatial transcriptomics (ST) and pathology. It proposes a three-layer classification system (embedding layer, model layer, knowledge layer) and maps the technology development roadmap from 2018 to 2025. Maintained by ChlorineHi, this open-source project provides resources such as paper codes, data, and standardized evaluation frameworks, aiming to fill the gap of the lack of systematic organization in the field and offer technical references for researchers.

## Research Background and Significance

### Research Background
Spatial transcriptomics (ST) can preserve tissue spatial information and measure gene expression, while pathology analyzes tissue morphological features through microscopic images. Their fusion enables a more comprehensive understanding of disease mechanisms (especially in cancer research). In recent years, deep learning has driven progress in multimodal fusion, but the field lacks systematic organization.

### Project Significance
The ST-Path Survey project fills this gap and provides researchers with a comprehensive technical review and development roadmap.

## Detailed Explanation of the Three-Layer Classification System

### Embedding Layer Fusion
Focuses on integration at the feature representation level:
- Early fusion: Concatenate/transform raw data into a unified space—simple but prone to losing modal information;
- Late fusion: Fusion at the decision layer after feature extraction from each modality—preserves modal specificity but lacks interaction information;
- Middle fusion: Fusion at the intermediate layer of feature extraction—balances advantages and disadvantages, and is the mainstream method.
Representative methods: Cross-modal alignment with attention, contrastive learning for modal representation, autoencoder for shared latent space.

### Model Layer Fusion
Focuses on network architecture design:
- Encoder-decoder: Independent encoders process modalities, and a shared decoder outputs results;
- Transformer: Self-attention handles multimodal sequences (joint modeling of ViT and gene expression);
- GNN: Model tissue slices as graphs to capture spatial dependencies;
- Hybrid architecture: Combine the advantages of CNN, Transformer, and GNN.

### Knowledge Layer Fusion
Focuses on domain knowledge integration:
- Prior knowledge embedding: Embed biological pathways and gene regulatory networks into models as graphs/constraints;
- Causal reasoning: Infer causal relationships between gene expression and morphological features;
- Interpretability: Attention visualization, feature attribution, etc.;
- Knowledge graph integration: Pathological-genomic graphs support reasoning.

## Technology Development Roadmap (2018-2025)

### 2018-2020: Rise of Representation Learning
Deep learning applications in single modalities:
- 2018: DeepST uses CNN to process ST data;
- 2019: BERT inspires gene expression sequence modeling;
- 2020: Self-supervised learning applied to pathological images.
Breakthroughs: Spatial information encoding, gene expression dimensionality reduction, pathological image slice processing.

### 2020-2022: Exploration of Multimodal Fusion
Systematic fusion of two modalities:
- 2020: First batch of multimodal methods emerged;
- 2021: Contrastive learning showed potential;
- 2022: Attention became the standard for cross-modal alignment.
Representative works: ST-Net, DeepSpaCE, HisToGene.

### 2022-2024: Era of Foundation Models
Dominance of large-scale pre-training:
- 2022: CLIP inspired biomedical applications;
- 2023: Pathological image foundation models (UNI, Prov-GigaPath) released;
- 2024: ST foundation models emerged.
Trends: Self-supervised pre-training becomes standard, model scale grows, multi-task capability improves.

### 2024-2025: Unification and Standardization
Establishment of unified frameworks and evaluation standards:
- Construction of large-scale multi-center datasets;
- Standardized benchmark testing;
- Improvement of open-source tool ecosystems;
- Acceleration of clinical translation.

## Key Technical Challenges

### Data Heterogeneity
- Resolution mismatch: High resolution of pathological images vs. regionality of gene expression;
- Data sparsity: Large number of zero values in ST data vs. dense pathological images;
- Scale difference: Molecular level vs. cellular/tissue level.
Solutions: Multi-scale feature pyramids, cross-resolution alignment, missing data imputation.

### Interpretability Requirements
Biomedicine requires model interpretability:
- Explain the reasons for predictions;
- Identify key genes and morphological features;
- Discover new mechanisms.
Progress: Attention visualization, SHAP/Integrated Gradients feature attribution, CAV.

### Data Privacy and Sharing
Sensitive medical data limits dataset construction:
- Privacy regulations (HIPAA, GDPR);
- Barriers to inter-institutional sharing;
- Difficulty in obtaining annotated data.
Responses: Federated learning, synthetic data generation, transfer learning/domain adaptation.

## Application Scenarios and Clinical Value

### Cancer Typing and Prognosis
- Distinguish subtypes that are difficult to classify with traditional methods;
- Predict prognosis and treatment response;
- Discover new therapeutic targets.

### Tumor Microenvironment Analysis
- Immune cell infiltration patterns;
- Tumor-stroma boundary features;
- Quantification of spatial heterogeneity.

### Drug Response Prediction
- Chemotherapy sensitivity prediction;
- Immune therapy response assessment;
- Drug resistance mechanism research.

## Future Development Directions and Summary

### Future Development Directions
- Large-scale pre-training: Integration of millions of slice data, optimization of self-supervised strategies;
- Multimodal foundation models: Process images/genes/text, zero-shot learning, cross-cancer generalization;
- Causal reasoning: Causal inference between genes and morphology, treatment mechanism modeling;
- Clinical translation: Real-time analysis systems, regulatory approval, workflow integration.

### Summary
ST-Path Survey provides a systematic review for the field. The three-layer classification system and roadmap help researchers clarify the technical context. It is an important reference for researchers in computational pathology, bioinformatics, and medical AI, and the field is expected to play a greater role in precision medicine.
