# AROMA: Multimodal Enhanced Reasoning Enables Accurate Prediction of Gene Perturbations in Virtual Cells

> AROMA integrates textual evidence, graph topology, and protein sequence features, and achieves accurate and interpretable prediction of gene perturbations in virtual cells through a two-stage optimization strategy, maintaining robustness in zero-shot and long-tail scenarios.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-22T07:10:45.000Z
- 最近活动: 2026-04-23T01:58:28.533Z
- 热度: 141.2
- 关键词: 虚拟细胞, 基因扰动, 多模态学习, 计算生物学, 可解释AI, 知识图谱, 系统生物学, 药物发现
- 页面链接: https://www.zingnex.cn/en/forum/thread/aroma
- Canonical: https://www.zingnex.cn/forum/thread/aroma
- Markdown 来源: floors_fallback

---

## Introduction: AROMA—Multimodal Enhanced Reasoning Facilitates Accurate Prediction of Gene Perturbations in Virtual Cells

AROMA is a multimodal enhanced reasoning framework for predicting gene perturbations in virtual cells. It integrates textual evidence, graph topology information, and protein sequence features, and achieves accurate and interpretable predictions through a two-stage optimization strategy of knowledge pre-training and task fine-tuning. The model demonstrates robust performance in zero-shot and long-tail scenarios, and is of great significance to biomedical fields such as drug discovery and disease mechanism research.

## Background: Value of Virtual Cells and Bottlenecks of Existing Methods

Virtual cells are a core goal of computational biology, simulating cellular molecular states and behaviors through computational models. Gene perturbation modeling is a key application (e.g., drug target discovery, disease mechanism research, synthetic biology design, precision medicine, etc.). Existing methods have three major bottlenecks: unconstrained reasoning violates biological laws, predictions are non-interpretable, and retrieval signals are weakly aligned with regulatory topology.

## Methodology: Multimodal Architecture and Two-Stage Optimization of AROMA

The core design concept of AROMA is to integrate multi-source heterogeneous knowledge for explicit reasoning, processing three types of information: textual evidence (scientific literature, database descriptions, etc.), graph topology information (gene regulatory networks, protein-protein interaction networks, etc.), and protein sequence features. The model architecture includes a text encoder (BioBERT/PubMedBERT), a graph neural network, a sequence encoder (ESM), a cross-modal fusion module, and a reasoning module. Training uses a two-stage strategy: knowledge pre-training (large-scale knowledge graphs and literature data) + task fine-tuning (specific gene perturbation datasets, with interpretability constraints introduced).

## Data Resources: PerturbReason Dataset and Knowledge Graphs

The research team contributed two major data resources: 1. The PerturbReason dataset (over 498,000 samples, including perturbation information, context, effect descriptions, reasoning chains, and evidence sources); 2. Knowledge graphs (gene regulatory graphs: encoding gene regulatory relationships, transcription factor-target gene associations, etc.; functional annotation graphs: integrating GO annotations, KEGG pathways, etc.).

## Experimental Validation: Multi-Dimensional Performance Evaluation Results

AROMA was validated across multiple dimensions: it outperforms existing methods in multiple cell lines (cancer cell lines such as HeLa/A549, normal cell lines such as HEK293/HepG2, and stem cell lines); it shows robust zero-shot generalization (unseen cell lines); it is highly competitive in long-tail scenarios (rare genes, sparse knowledge); and it scores high in interpretability evaluation (biological rationality, evidence support, completeness).

## Technical Advantages: Distinct Features of AROMA Compared to Existing Methods

AROMA has significant advantages over existing methods: more comprehensive multimodal knowledge integration; explicit reasoning generates interpretable chains to enhance credibility; strong alignment between evidence and regulatory topology; generalization ability (unseen genes/cell types); data efficiency (good performance in sparse knowledge scenarios).

## Application Prospects and Limitations: From Biomedicine to Future Directions

Application prospects include drug discovery (predicting drug effects, screening targets), disease research (simulating mutation impacts), synthetic biology (optimizing gene circuits), and personalized medicine (predicting treatment responses). Limitations: insufficient single-cell resolution, limited modeling of dynamic processes, lack of spatial information, and need for validation of causal inferences. Future directions: integrating single-cell RNA sequencing, introducing a temporal dimension, combining spatial transcriptomics, and developing causal validation frameworks.

## Conclusion: Significance of AROMA to the Virtual Cell Field and Open-Source Contributions

AROMA represents an important advancement in virtual cell modeling, demonstrating that the combination of knowledge-driven multimodal modeling and explicit reasoning can balance accuracy and interpretability. The model weights and code have been open-sourced, and we look forward to promoting the transition of virtual cell technology from research to application, facilitating life understanding, disease treatment, and biological system design.
