Zing Forum

Reading

NUS Capstone: Research on Cancer Gene Prediction Using Graph Neural Networks

NUS Capstone Project: Using Graph Neural Network technology to predict cancer-related genes and promote the development of precision medicine.

图神经网络癌症基因生物信息学精准医疗基因预测GNN顶点项目NUS
Published 2026-05-21 22:42Recent activity 2026-05-21 22:54Estimated read 8 min
NUS Capstone: Research on Cancer Gene Prediction Using Graph Neural Networks
1

Section 01

Introduction: NUS Capstone Project—Predicting Cancer Genes with Graph Neural Networks to Support Precision Medicine

The Capstone Project at the National University of Singapore (NUS) explores the intersection of computer science and biomedicine. It uses Graph Neural Network (GNN) technology to predict cancer-related genes, addressing the time-consuming and labor-intensive issues of traditional gene research methods and supporting the development of precision medicine. This project integrates the knowledge learned, aiming to model complex relationships between genes via GNNs and identify cancer-driving genes.

2

Section 02

Project Background and Scientific Significance

Definition of Capstone Project

The Capstone Project is a mandatory course for senior students in majors like Computer Science at NUS. It requires students to complete research tasks with practical significance and test their ability to solve complex problems.

Biological Background of Cancer Genes

Cancer is a genetic disease. Oncogenes become overactive after mutation, while tumor suppressor genes lose their function after mutation—both lead to uncontrolled cell proliferation. Identifying these genes is crucial for understanding mechanisms and developing targeted drugs.

Applicability of GNNs

Genes are connected through interaction networks (e.g., protein-protein interaction networks), which are naturally suited for representation as graphs (nodes as genes, edges as interactions). GNNs can capture relational patterns between nodes, which is difficult for traditional neural networks to achieve.

3

Section 03

Technical Architecture and Methodology

Data Preparation and Preprocessing

  • Data sources: Gene interaction networks (STRING, BioGRID), cancer gene annotations (COSMIC, TCGA), gene features (expression data, ontology annotations, sequence features)
  • Preprocessing: Cleaning low-confidence edges, feature standardization, splitting into training/validation/test sets

GNN Model

Uses message-passing mechanisms where nodes aggregate neighbor information to update their own representations; variants like GCN, GAT, GraphSAGE may be used.

Node Classification Task

Models prediction as a node classification task: Input gene networks and features, output probabilities of cancer genes, and train using binary cross-entropy loss.

Model Evaluation

For class imbalance, uses ROC-AUC, Precision-Recall AUC, Top-K accuracy, and cross-validation to ensure stable results.

4

Section 04

Scientific Findings and Potential Value

  • Novel cancer gene candidates: Identifies understudied potential driver genes for experimental validation.
  • Understanding gene network topology: Analyzes key interactions via attention weights.
  • Drug target discovery: Predicted cancer genes can serve as targets for drug development; inhibiting their functions may treat cancer.
5

Section 05

Technical Challenges and Solutions

Data Sparsity (Class Imbalance)

Solutions: Using class weights, over/under sampling, graph autoencoder pre-training.

Network Noise

Solutions: Filtering low-quality edges, attention mechanisms to learn edge importance, integrating multiple data sources.

Interpretability Requirements

Solutions: GNNExplainer for decision interpretation, visualizing attention weights, pathway enrichment analysis to validate result rationality.

6

Section 06

Comparison with Traditional Methods: Advantages of GNNs

Traditional Machine Learning Methods

  • Relies on hand-crafted features (topology, sequence) and uses methods like random forests/SVMs
  • Disadvantages: Feature engineering requires expert knowledge; difficult to capture complex network patterns

Advantages of GNNs

  • End-to-end learning, automatic feature extraction
  • Naturally models gene interactions
  • Scalable to handle large-scale networks
  • Supports transfer learning to similar networks
7

Section 07

Future Development Directions

  • Multi-omics data fusion: Integrate genomic variation, epigenetics, transcriptomics, etc., to improve accuracy.
  • Modeling specific cancer types: Train specialized models for lung cancer, breast cancer, etc.
  • Dynamic network modeling: Use temporal GNNs to capture dynamic changes in gene networks.
  • Drug response prediction: Extend the model to predict the impact of gene mutations on drug responses, guiding precision medication.
8

Section 08

Value of Interdisciplinary Collaboration and Project Significance Summary

Value of Interdisciplinary Collaboration

  • Computer science: Provides real-world graph learning scenarios, promoting GNN applications in bioinformatics.
  • Biomedicine: High-throughput screening tool to discover genes hard to identify via traditional methods, accelerating data-to-hypothesis transformation.

Project Significance

This project is a microcosm of "AI for Science". AI becomes an engine for scientific discovery, changing the paradigm of biomedical research (from hypothesis-driven to data-driven, single-gene to systems biology, lab trial-and-error to computational prediction). It provides references for AI applications, bioinformatics, and precision medicine, and is expected to promote progress in cancer research and treatment.