Zing Forum

Reading

Multimodal Deep Learning Breaks Through T Cell Functional State Prediction: A New Method Integrating Gene Expression and TCR Sequences

This article introduces a multimodal deep learning model that fuses single-cell RNA sequencing and T cell receptor sequencing data. By integrating gene expression profiles, TCR sequence embeddings, and V/J gene usage information via a bidirectional cross-attention mechanism, it achieves high-precision classification of T cell functional states.

T细胞多模态深度学习单细胞测序TCR基因表达免疫学肿瘤免疫交叉注意力PyTorch
Published 2026-05-16 07:54Recent activity 2026-05-16 08:17Estimated read 6 min
Multimodal Deep Learning Breaks Through T Cell Functional State Prediction: A New Method Integrating Gene Expression and TCR Sequences
1

Section 01

[Introduction] Multimodal Deep Learning Breaks Through T Cell Functional State Prediction: A New Method Integrating Gene Expression and TCR Sequences

Accurate identification of T cell functional states is crucial in tumor immunotherapy and autoimmune disease research. Recently, the open-source multimodal-tcell-classifier project proposed an innovative multimodal deep learning architecture. By integrating gene expression profiles, TCR sequence embeddings, and V/J gene usage information via a bidirectional cross-attention mechanism, it achieves high-precision classification of seven T cell functional states, providing a practical tool for single-cell multi-omics analysis.

2

Section 02

Research Background and Challenges

T cells are core executors of the adaptive immune system, and their functional states determine the effectiveness of immune responses. However, the same TCR sequence may correspond to different functional states such as effector, memory, or exhausted, so function cannot be determined by TCR alone. Traditional unimodal methods have obvious limitations: classification accuracy is only 33.7% using TCR sequences alone, and 69.9% using gene expression alone. There is a need to fuse multi-source data to capture the complete biological picture.

3

Section 03

Model Architecture and Training Strategy

The model uses a bidirectional cross-attention fusion mechanism. Inputs include 3000 highly variable gene expression profiles, TCR-BERT embeddings (CDR3α/β), and one-hot encoding of V/J genes. Gene expression is dimensionality-reduced via two encoder layers, and TCR embeddings are extracted using pre-trained TCR-BERT. The bidirectional cross-attention layer achieves deep modal fusion. Training uses an ensemble of 8 models (soft voting), with data from 4 public datasets (136,000 cells). Techniques such as AdamW optimization, cosine annealing learning rate, and label smoothing are applied. The final ensemble model achieves an internal test accuracy of 89.6% and a macro F1 score of 0.88.

4

Section 04

Functional State Classification and Generalization Performance

The model classifies T cells into 7 states: Treg (markers like FOXP3, F1=0.94), effector T (markers like GZMB, F1=0.91), proliferative phase (markers like MKI67, F1=0.90), memory T (markers like IL7R, F1=0.89), naive T (markers like CCR7, F1=0.86), exhausted T (markers like PDCD1, F1=0.83), and Th_effector (F1=0.75). In external validation: accuracy is 86.4% on non-small cell lung cancer datasets, 67.2% on glioblastoma (poor exhausted T classification), and 62.6% on skin cancer (blurred boundary between naive and memory T).

5

Section 05

Comparative Analysis and Application Tools

Ablation experiments show: accuracy is 33.7% with TCR alone, 69.9% with gene expression alone, 79.3% when adding TCR embeddings, 88.1% when adding V/J and complete gene expression, 0.7% improvement from cross-attention, and an additional 0.8% from ensemble. Compared with XGBoost: XGBoost is slightly better in internal tests (90.6%), but the neural network generalizes better in external cohorts (leading by 8.2% in non-small cell lung cancer). The tool ecosystem includes pip installation, the predict_report.py script, automatic weight downloading, outputs like predictions.csv, annotated.h5ad, and interactive reports, and supports Python API integration.

6

Section 06

Limitations and Future Directions

Model limitations: The 7-category framework mixes lineage, function, and cell cycle dimensions; cross-tissue generalization is unstable (e.g., poor exhausted T classification in glioblastoma); false positive proliferative phase predictions. Future directions: Develop tissue-specific models, hierarchical classification strategies, and improve normalization and domain adaptation techniques.