Input Layer:
- Multi-omics feature vectors (gene expression, mutation, copy number variation, etc.)
- Multiplex adjacency matrices (one matrix per network type)
Directed Graph Convolution Layer:
Traditional Graph Convolutional Networks (GCN) assume the graph is undirected and all neighbors contribute equally to the central node. MNDGNN's directed graph convolution considers:
- Neighbor Diversity: Different types of neighbors (upstream regulators, downstream targets, interacting proteins) should be treated differently
- Degree Diversity: The in-degree and out-degree of a node reflect its different roles in the network
In implementation, the model learns independent convolution kernels for each network type and aggregates representations from different networks through an attention mechanism.
Data Augmentation Module:
To address the label scarcity problem, MNDGNN adopts a two-pronged strategy:
- Positive Sample Augmentation: For known cancer driver genes, data expansion is performed using neighbor similarity in the network
- Negative Sample Inference: Uses anomaly detection algorithms (e.g., DeepOD) to identify "high-confidence non-driver genes" from a large number of unlabeled genes as negative samples
Prediction Layer:
Uses a Multi-Layer Perceptron (MLP) to output the probability that each gene is a cancer driver gene, and uses class weights to handle class imbalance.