Zing Forum

Reading

AcmGENTIC: An End-to-End Solution for Automatically Mining Functional Evidence of Genomic Variants Using Large Language Models

One of the biggest bottlenecks in clinical genomics is how to convert experimental evidence from massive literature into structured data that can be used for variant pathogenicity interpretation. The AcmGENTIC system introduced in this article achieves full-process automation (including abstract screening, full-text evidence extraction and classification, and evidence summary generation) using LLM, achieving 96% accuracy on the ClinGen benchmark, and provides a scalable technical framework for evidence management in precision medicine.

基因组变异功能证据大语言模型精准医学文献挖掘ClinGenACMG指南临床基因组学
Published 2026-03-31 23:08Recent activity 2026-04-02 09:48Estimated read 6 min
AcmGENTIC: An End-to-End Solution for Automatically Mining Functional Evidence of Genomic Variants Using Large Language Models
1

Section 01

AcmGENTIC: An End-to-End Solution for Automatically Mining Functional Evidence of Genomic Variants Using LLM (Introduction)

Clinical genomics faces the bottleneck of converting experimental evidence from massive literature into structured data for variant pathogenicity interpretation, with most variants being Variants of Uncertain Significance (VUS). The AcmGENTIC system achieves full-process automation (including abstract screening, full-text evidence extraction and classification, and evidence summary generation) using large language models, achieving 96% accuracy on the ClinGen benchmark, and provides a scalable technical framework for evidence management in precision medicine.

2

Section 02

Background: Evidence Dilemma in Precision Medicine

In the era of precision medicine, genomic sequencing has become routine, but most variants are VUS, which require integration of multi-dimensional evidence such as functional experiments and population frequency. Functional evidence is scattered across tens of thousands of literatures; manual processing is time-consuming, labor-intensive, and difficult to scale. Traditional literature mining relies on keyword matching, which struggles to handle complex biomedical contexts; LLM applications need to solve the core problems of accurately identifying relevant literature and extracting structured evidence.

3

Section 03

Research Design: Benchmark Testing Based on ClinGen

A benchmark dataset annotated by ClinGen experts was constructed, extracting PubMed identifiers, evidence labels, etc., to form "variant-literature" pairs. The gpt-4o-mini (non-inference) and o4-mini (inference) models were evaluated, with tasks divided into two stages: abstract screening (judging whether the literature reports functional experiments on specific variants) and full-text evidence extraction and classification (extracting evidence direction, strength, and experiment type).

4

Section 04

Evidence: Results of Abstract Screening

In abstract screening, both models had high recall rates (0.88-0.90) but low specificity (0.59-0.65). The "better to include than miss" strategy is reasonable: initial screening ensures recall, and subsequent full-text analysis performs fine filtering. Model limitations: it is difficult to judge whether the experiment is truly targeted at the target variant, requiring subsequent verification.

5

Section 05

Evidence: Advantages of Full-Text Evidence Extraction

After introducing the "variant matching gate", o4-mini performed significantly: evidence classification accuracy of 96%, specificity of 0.83 (gpt-4o-mini only 0.37), and F1 score of 0.98. LLM-as-judge evaluation showed that the summary generated by o4-mini was of higher quality, providing an evaluation framework for model iteration.

6

Section 06

End-to-End Process of the AcmGENTIC System

The AcmGENTIC process includes: 1. Variant identifier expansion (converting HGVS to multiple forms); 2. Intelligent literature retrieval (obtaining metadata and full text from PubMed, etc.); 3. LLM abstract screening (initial screening with lightweight models); 4. Multimodal evidence extraction (PDF full-text analysis including chart parsing); 5. Structured report generation (for expert review).

7

Section 07

Technical Insights and Clinical Significance

Technical insights: Adopting a "human-in-the-loop" approach, LLM handles tedious tasks while experts review decisions, leveraging their respective strengths. Clinical significance: Solves the variant annotation pressure brought by the growth of genomic sequencing demand; the human-machine collaboration model balances automation and accuracy, providing feasible ideas for precision medicine.

8

Section 08

Limitations and Future Directions

Limitations: Training data from ClinGen may have domain bias; only English literature is processed; complex chart parsing needs improvement. Future directions: Expand data to cover more diseases and variants; optimize fine-tuning strategies; enhance chart understanding; establish an expert feedback mechanism for continuous iteration.