Reading

Epi-PRS: Precise Polygenic Risk Prediction Using Genomic Large Language Models

This article introduces the Epi-PRS project, an innovative polygenic risk scoring method that converts individual DNA sequences into personalized genomic and epigenomic features using genomic large language models, providing new insights for disease risk modeling.

多基因风险评分基因组学大语言模型Enformer精准医学疾病风险预测深度学习GWAS

Published 2026-06-17 02:11Recent activity 2026-06-17 02:25Estimated read 6 min

Section 01

Epi-PRS: Precise Polygenic Risk Prediction Using Genomic Large Language Models (Introduction)

Original Author/Maintainer: SUwonglab Source Platform: GitHub Original Link: https://github.com/SUwonglab/Epi-PRS Publication Time: 2026-06-16T18:11:51Z

Epi-PRS is an innovative polygenic risk scoring method. It converts individual DNA sequences into personalized genomic and epigenomic features using genomic large language models (such as DeepMind's Enformer), addressing limitations of traditional PRS methods like reliance on statistical associations and neglect of gene regulatory mechanisms, thus providing new ideas for disease risk modeling.

Section 02

Project Background: Limitations of Traditional PRS Methods

Traditional PRS is based on GWAS summary statistics and has the following limitations:

Linear assumption limitation: Ignores gene-gene interactions and non-linear effects;
Lack of functional annotation: Difficult to utilize information from non-coding regulatory variants;
Population bias: Reduced prediction accuracy in non-European populations. Epi-PRS attempts to mitigate these issues through deep learning models.

Section 03

Core Technologies and Method Workflow

Core Technology: Enformer Model Enformer is a Transformer architecture model developed by DeepMind, which can predict molecular phenotypes of DNA sequences. Its features include:

Accepts sequence input of up to 196608 base pairs;
Multi-task prediction of 5313 molecular phenotypes;
Captures long-range sequence dependencies.

Epi-PRS Workflow:

Individual genomic feature extraction: Extract Enformer features from DNA sequences of target regions;
Epigenomic feature engineering: Cross-cell type aggregation, functional region weighting, dimensionality reduction;
Risk prediction model training: Train prediction models using linear models, elastic net, or gradient boosting trees.

Section 04

Technical Advantages and Application Scenarios

Technical Advantages:

Biological interpretability: Features correspond to clear molecular phenotypes;
Utilizes non-coding variants: Covers 98% of the non-coding genome;
Integrates rare variants: Learns from complete sequences;
Cross-population generalization: Trained on diverse data.

Application Scenarios:

Disease risk stratification: E.g., early screening for individuals at high risk of breast cancer;
Pharmacogenomics: Guides personalized medication;
Complex disease research: Identifies risk genes and regulatory pathways. The project repository includes an example of breast cancer risk prediction.

Section 05

Limitations and Challenges

Epi-PRS faces the following challenges:

High computational cost: Enformer inference requires a lot of resources;
Large feature dimension: Prone to overfitting;
Causal inference problem: Only identifies statistical associations rather than causal effects;
Model update requirement: Needs to update reference genome and cell type information with new data.

Section 06

Future Development Directions

Future directions of Epi-PRS:

More powerful base models: Support longer sequences and more prediction tasks;
Multi-omics integration: Combine transcriptome, proteome, and other data;
Causal inference methods: Distinguish between correlation and causation;
Clinical translation research: Validate clinical utility and cost-effectiveness.

Section 07

Conclusion

Epi-PRS integrates deep learning and genomics, opening up a new path for polygenic risk prediction. It not only improves prediction accuracy but also enhances the understanding of the genetic mechanisms of diseases. With the accumulation of data and improvement of computing power, such methods will play an important role in precision medicine, providing more accurate risk assessment tools for researchers and clinicians.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23