# pmhc-present: AI-based Research on Tumor Neoantigen-HLA Binding Prediction and Fairness Evaluation

> A research project for UCL's COMP0190 course that systematically compares the performance of sequence models and structural models in tumor neoantigen presentation prediction, with a special focus on prediction fairness across populations of different ancestral backgrounds

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-16T18:09:50.000Z
- 最近活动: 2026-06-16T18:22:03.339Z
- 热度: 159.8
- 关键词: 肿瘤新抗原, HLA结合预测, AlphaFold, 免疫治疗, 公平性评估, 结构生物学, NetMHCpan, 癌症疫苗
- 页面链接: https://www.zingnex.cn/en/forum/thread/pmhc-present-ai-hla
- Canonical: https://www.zingnex.cn/forum/thread/pmhc-present-ai-hla
- Markdown 来源: floors_fallback

---

## [Introduction] pmhc-present: AI-driven Tumor Neoantigen Prediction and Population Fairness Research

This project is a research project for UCL's COMP0190 course, aiming to systematically compare the performance of sequence models (e.g., NetMHCpan) and structural models (based on AlphaFold) in predicting tumor neoantigen-HLA binding, with a special focus on prediction fairness across populations of different ancestral backgrounds. Core research questions include the predictive ability of structural models for rare HLA alleles, the synergistic effect of sequence and structural features, and revealing the model's learning mechanism through mutation scanning. The project also emphasizes the ethical dimension of genomic medicine to ensure the fairness of technical applications.

## Research Background: Core Challenges in Tumor Immunotherapy

Tumor neoantigens are novel protein fragments generated by gene mutations in cancer cells. When presented on the cell surface by HLA molecules, they can activate the immune system and are key targets for immunotherapy and personalized vaccines. However, peptides need to bind to HLA to form stable complexes to be recognized. HLA genes have extremely high polymorphism (over 30,000 alleles), and their distribution varies significantly across different populations. Accurate prediction of binding is a major challenge in tumor immunoinformatics.

## Research Motivation: A Structural Perspective Beyond Sequences

Current mainstream tools (e.g., NetMHCpan) rely on sequence information and perform well on common HLA alleles, but cannot directly model 3D structural features. Structural information can capture the positions of anchor residues and the geometric complementarity of binding pockets, and may have better generalization ability for rare HLA alleles with scarce training data. The core question of this project: Can structural information improve prediction accuracy, especially for underrepresented HLA alleles?

## Research Design and Methodology Framework

The project designs three progressive research questions: 1) Compare the performance of AlphaFold structural features and pure sequence features on HLA alleles of different frequencies, hypothesizing that structural models generalize better to rare alleles; 2) Explore the synergistic integration of sequence and structural features (feature concatenation, multimodal attention, etc.) and evaluate performance differences overall and across different HLA groups; 3) Reveal the learning mechanism by comparing the anchor residues identified by the two models and their sensitivity to flexible regions of peptides through computational saturation mutation scanning.

## Fairness Evaluation: Ethical Dimension of Genomic Medicine

HLA allele frequencies vary significantly across populations of different ancestral backgrounds. If models perform poorly on rare alleles, it will systematically reduce prediction accuracy for some populations. The project uses the TRACERx non-small cell lung cancer dataset for validation, distinguishing between public benchmark datasets and controlled application validation datasets to ensure rigorous data management.

## Technical Implementation Details

The data processing pipeline includes extracting validation peptides from the MHC Motif Atlas, generating length-matched negative samples (proteome mode and fast baseline mode), and HLA pseudosequence mapping. Structural feature extraction involves AlphaFold refolding (high computational cost). The project optimizes resource allocation: marking features that require refolding (e.g., pLDDT) and features that can be computed quickly (e.g., contact maps).

## Project Limitations and Future Outlook

Currently in the Beta phase, some functions (large-scale AlphaFold refolding, complete training pipeline) are only available on GPU servers. Future directions: Extend to HLA class II molecules, integrate T-cell receptor cross-reactivity prediction, and develop lightweight structural feature extraction methods to reduce computational barriers.

## Conclusion

pmhc-present combines deep learning with structural biology and focuses on the ethical fairness of technical applications, representing an important direction in computational oncology. In today's era of rapid development of precision immunotherapy, such research not only has academic value but also relates to all patients benefiting equally from advances in genomic medicine.
