Zing Forum

Reading

GenoME: A Multimodal Genomic Prediction and Perturbation Analysis Model Based on MoE Architecture

GenoME is a generative model based on the Mixture of Experts (MoE) architecture, which can integrate DNA sequences and cell type-specific chromatin accessibility data to achieve unified cross-scale and cross-modal genomic prediction.

GenomicsMoEMixture of ExpertsMulti-modalATAC-seqEpigenomicsDeep LearningBioinformatics
Published 2026-05-24 12:11Recent activity 2026-05-24 12:23Estimated read 8 min
GenoME: A Multimodal Genomic Prediction and Perturbation Analysis Model Based on MoE Architecture
1

Section 01

GenoME: A Multimodal Genomic Prediction and Perturbation Analysis Model Based on MoE Architecture (Introduction)

GenoME is a generative model based on the Mixture of Experts (MoE) architecture released by JWei2015 on GitHub. Its core is to integrate DNA sequences and cell type-specific chromatin accessibility data to achieve unified cross-scale (base pair to kilobase) and cross-modal genomic prediction, and support computational perturbation analysis.

Source: GitHub (https://github.com/JWei2015/GenoME), Release time: 2026-05-24T04:11:04Z

2

Section 02

Multimodal Challenges in Genomics (Background)

Genomics research faces a core challenge: how to integrate massive data from different experimental techniques and biological scales to build a unified prediction framework. Traditional methods are limited to a single modality (e.g., focusing only on gene expression or chromatin structure) and struggle to capture the complex network of genomic regulation. GenoME emerged as a solution, using the MoE architecture to combine DNA sequences with cell type-specific chromatin accessibility data (ATAC-seq/DNase-seq) to achieve cross-scale and cross-modal genomic prediction.

3

Section 03

Core Architecture: Innovative Application of the MoE Model

The Mixture of Experts (MoE) architecture routes tasks to different expert sub-networks, balancing computational efficiency and model capacity. GenoME applies it innovatively to genomics:

  • DNA sequence expert: Processes raw genomic sequences
  • Chromatin accessibility expert: Analyzes ATAC-seq/DNase-seq data
  • Multimodal fusion expert: Integrates sequence and epigenetic information
  • Cross-scale prediction expert: Outputs multi-level results from base pairs to chromosome structures

This design ensures prediction accuracy and avoids computational redundancy of a single giant network.

4

Section 04

Multimodal Prediction Capabilities and Cross-Cell Type Generalization

Multimodal Prediction Capabilities

  • Epigenomics: Predicts chromatin modification states and transcription factor binding sites at base-pair resolution, helping to understand gene regulatory mechanisms and identify functional non-coding regions.
  • Transcriptomics: Predicts gene expression levels (mRNA abundance, isoform patterns) and captures transcriptional regulatory logic through chromatin accessibility information.
  • 3D chromatin structure: Predicts topologically associating domains (TADs) and chromatin loops at kilobase resolution to understand long-range interactions.

Cross-Cell Type Generalization

Through cell type embedding, conditional generation, and meta-learning strategies, it achieves regulatory landscape prediction for unseen cell types, supporting personalized medicine and rare cell type research.

5

Section 05

Computational Perturbation Analysis Function

GenoME supports in silico (computational simulation) perturbation analysis, which can simulate:

  • Genetic variations: DNA sequence changes such as insertions, deletions, and substitutions
  • Epigenetic perturbations: Altering chromatin accessibility in specific regions
  • Combined perturbations: Simultaneously simulating the effects of multiple changes

By comparing prediction results before and after perturbation, it identifies functional regulatory connections, infers causal relationships, and provides guidance for experimental design.

6

Section 06

Technical Implementation and Data Formats

Technical Implementation

Built on PyTorch 2.0+ and PyTorch Lightning, supporting CUDA acceleration. Dependencies include:

  • Sequence processing: kipoiseq
  • Genomic data: pyBigWig (BigWig files), cooler/cooltools (Hi-C data)
  • Training management: PyTorch Lightning (distributed training, experiment management)

Input Data Formats

Data Type Format Description
DNA sequence FASTA hg38 reference genome
Chromatin accessibility BigWig Base-pair resolution
Expression data BigWig RNA-seq signal
3D structure cooler Hi-C contact matrix
7

Section 07

Application Scenarios and Prospects

GenoME opens up new possibilities for computational biology and precision medicine:

  • Disease mechanism research: Simulates disease-related genetic variations and epigenetic changes to explore molecular mechanisms.
  • Drug target discovery: Identifies key regulatory elements and transcription factors, providing candidate targets.
  • Personalized genomics: Predicts specific regulatory landscapes based on individual genomic data, supporting precision medicine.
  • Rare cell type research: Predicts regulatory features of hard-to-obtain rare cell types, guiding experimental design.
8

Section 08

Conclusion: Important Progress in the Intersection of AI and Genomics

GenoME represents an important advance in the intersection of AI and genomics, introducing MoE architecture innovation and multimodal learning concepts into genomic prediction, providing a new paradigm for solving complex biological prediction problems. With the popularization of single-cell sequencing technology and the improvement of computing power, such multimodal models will play a more important role in life science research.