Reading

MNDGNN: A Method for Identifying Cancer Driver Genes Based on Multiplex Networks and Directed Graph Neural Networks

This article introduces the MNDGNN model, an innovative multiplex network directed graph neural network method that addresses the issues of label scarcity and class imbalance in cancer driver gene identification by integrating multi-omics data and data augmentation techniques.

MNDGNN癌症驱动基因图神经网络多重网络多组学精准医学生物信息学深度学习数据增强

Published 2026-04-30 16:15Recent activity 2026-04-30 16:22Estimated read 9 min

Section 01

Introduction / Main Floor: MNDGNN: A Method for Identifying Cancer Driver Genes Based on Multiplex Networks and Directed Graph Neural Networks

Section 02

Introduction: Core Challenges in Precision Oncology

The identification of cancer driver genes is fundamental to precision oncology research and clinical applications. These genes play a key role in tumor initiation and progression and are important targets for targeted therapy. However, this field faces two fundamental challenges: first, the complex regulatory relationships between genes are difficult to fully characterize using a single network; second, the number of experimentally validated cancer driver genes is extremely limited compared to the vast genome, leading to severe label scarcity and class imbalance issues. MNDGNN (Multiplex Networks-based Directed Graph Neural Network) is an innovative method proposed to address these problems.

Section 03

Limitations of Traditional Methods

Most existing cancer driver gene identification methods rely on a single biological network (such as the Protein-Protein Interaction network, PPI) to model gene relationships. This simplified approach has obvious shortcomings:

Single Perspective Limitation: Gene regulation in biological systems is multi-level and multi-type. PPI only reflects physical interactions between proteins and cannot cover other important dimensions such as transcriptional regulation, signaling pathways, and kinase-substrate relationships
Lack of Directionality: Many biological interactions have clear directionality (e.g., kinase phosphorylation of substrates), and undirected graphs cannot express this asymmetric relationship
Label Scarcity Dilemma: There are only hundreds of experimentally validated cancer driver genes, while the human genome has more than 20,000 protein-coding genes, resulting in an extremely imbalanced ratio of positive to negative samples

Section 04

Opportunities from Multi-omics Data

With the development of high-throughput sequencing technology, multi-omics data (genomics, transcriptomics, proteomics, etc.) and various biological network data have become increasingly abundant. This provides the possibility to integrate multiplex network information and build a more comprehensive gene relationship model.

Section 05

Key Innovations

MNDGNN proposes three key innovations:

Multiplex Network Integration: Simultaneously uses multiple network types such as PPI, protein complexes, KEGG pathways, RegNetwork, DawnNet, and kinase-substrate networks
Directed Graph Convolution: Designs a dedicated directed graph convolution operation to capture neighbor diversity and degree diversity
Data Augmentation Strategy: Combines positive sample augmentation and negative sample inference to alleviate the label scarcity problem

Section 06

Model Architecture

Input Layer:

Multi-omics feature vectors (gene expression, mutation, copy number variation, etc.)
Multiplex adjacency matrices (one matrix per network type)

Directed Graph Convolution Layer:

Traditional Graph Convolutional Networks (GCN) assume the graph is undirected and all neighbors contribute equally to the central node. MNDGNN's directed graph convolution considers:

Neighbor Diversity: Different types of neighbors (upstream regulators, downstream targets, interacting proteins) should be treated differently
Degree Diversity: The in-degree and out-degree of a node reflect its different roles in the network

In implementation, the model learns independent convolution kernels for each network type and aggregates representations from different networks through an attention mechanism.

Data Augmentation Module:

To address the label scarcity problem, MNDGNN adopts a two-pronged strategy:

Positive Sample Augmentation: For known cancer driver genes, data expansion is performed using neighbor similarity in the network
Negative Sample Inference: Uses anomaly detection algorithms (e.g., DeepOD) to identify "high-confidence non-driver genes" from a large number of unlabeled genes as negative samples

Prediction Layer:

Uses a Multi-Layer Perceptron (MLP) to output the probability that each gene is a cancer driver gene, and uses class weights to handle class imbalance.

Section 07

Detailed Explanation of Network Types

MNDGNN integrates six types of biological networks:

PPI Network: Physical interactions between proteins
Protein Complex Network: Relationships between proteins that participate in the same complex
KEGG Pathway Network: Gene relationships in metabolic and signaling pathways
RegNetwork: Regulatory relationships between transcription factors and target genes
DawnNet: Disease-related gene network
Kinase-Substrate Network: Enzyme-substrate relationships in phosphorylation modification

These networks characterize functional associations between genes from different perspectives. After integration, they can more comprehensively reflect the potential role of genes in cancer development.

Section 08

Dataset

The study used the following data resources:

Multi-omics Data: Gene expression, mutation, and copy number variation data from projects such as TCGA
Validated Driver Genes: From authoritative databases such as the Cancer Gene Census
Candidate Gene Set: Possibly cancer-related genes that have undergone preliminary screening

Continue Reading

Keep going with more reads from the same topic.

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

SignalCut is an innovative web application that analyzes brands' visibility gaps in AI search, automatically generates evidence-based marketing strategies, and creates Hera video materials, helping early-stage brands gain a competitive edge in the AI answer engine era.

Recent activity 2026-04-26 11:27

AWS Open-Sources AI Search Citation Analysis System: Track Brand Exposure in AI Search Engines

An open-source project officially released by AWS, built on Amazon Bedrock, Step Functions, and React to form a complete serverless citation analysis system. It helps enterprises monitor their brand's citation status and competitive landscape in AI searches like ChatGPT, Perplexity, Gemini, and Claude.

Recent activity 2026-03-31 20:49

Next.js Application SEO and GEO Integrated Optimization Solution: Comprehensive Visibility from Search Engines to AI Assistants

This article delves into the stevewerme/seo-geo-nextjs project, an open-source tool designed specifically for Next.js applications to simultaneously optimize traditional search engine rankings (SEO) and generative engine visibility (GEO). It analyzes the project's core architecture, implementation mechanisms, practical application scenarios, and its strategic significance for developers and content creators.

Recent activity 2026-04-03 14:48

Baiyuan GEO Platform Technical White Paper: SaaS Engineering Practice for Generative Engine Optimization (GEO)

This article deeply analyzes the GEO Platform technical white paper developed by Baiyuan Technology, covering the seven-dimensional AI citation rate scoring algorithm, AXP shadow document delivery mechanism, Schema.org three-layer entity knowledge graph, and the hallucination automatic detection and repair closed-loop system, providing an engineering solution for brands to gain visibility in generative AI such as ChatGPT and Claude.

Recent activity 2026-04-18 22:54