Reading

AROMA: A New Framework for Predicting Gene Perturbation in Virtual Cells by Integrating Multimodal Reasoning and Reinforcement Learning

虚拟细胞建模基因扰动预测多模态学习知识图谱强化学习GRPO计算生物学ACL2026AI4Science

Published 2026-04-23 14:13Recent activity 2026-04-23 14:53Estimated read 10 min

AROMA: A New Framework for Predicting Gene Perturbation in Virtual Cells by Integrating Multimodal Reasoning and Reinforcement Learning

Section 01

[Main Floor/Introduction] AROMA: A New Framework for Predicting Gene Perturbation in Virtual Cells by Integrating Multimodal Reasoning and Reinforcement Learning

AROMA is a multimodal virtual cell modeling framework accepted by ACL 2026. By integrating textual evidence, graph topological structures, and protein sequences, combined with retrieval-augmented strategies and GRPO reinforcement learning, it achieves high-precision prediction and interpretability analysis of gene perturbation effects. It aims to address pain points such as high cost and long cycle of traditional gene perturbation experiments, and promote the cross-integration of natural language processing and computational biology.

Section 02

Research Background and Core Challenges

In biomedical research, gene perturbation experiments are core methods to understand cell functions and disease mechanisms. However, traditional wet-lab experiments are costly and time-consuming, making it difficult to systematically explore the effects of massive gene combinations. As a cutting-edge direction in computational biology, virtual cell modeling can simulate cell responses to gene perturbations, reducing costs and accelerating drug target discovery. The field faces three major challenges:

Data Heterogeneity: Gene function information is scattered across multimodal sources such as text literature, knowledge graphs, and protein sequences. A single modality cannot capture the complete biological context;
Lack of Interpretability: Although black-box models can predict perturbation effects, they cannot provide causal explanations understandable to biologists;
Limited Generalization Ability: The gene combinations covered by training data are limited, so models struggle to generalize to unseen perturbation scenarios. AROMA (Augmented Reasoning Over a Multimodal Architecture) is proposed to address these pain points and has been accepted by the ACL 2026 main conference.

Section 03

Technical Architecture: Data Construction and Multimodal Encoding

AROMA's technical architecture includes data construction and model reasoning phases:

Data Phase: Dual Knowledge Graph Construction

Construct two complementary biological knowledge graphs:

Gene-KG: Captures functional associations, regulatory relationships, and pathway memberships between genes;
Path-KG: Depicts the hierarchical structure of biological signaling pathways and cross-pathway interactions; At the same time, a large-scale virtual cell reasoning dataset PerturbReason is constructed to provide a foundation for evidence retrieval and reasoning.

Modeling Phase: Retrieval-Augmented Multimodal Encoding

When given a gene perturbation query:

Retrieve Relevant Evidence: Retrieve relevant textual evidence from knowledge graphs and literature;
Graph Neural Network Encoding: Use GNN to extract topological features from Gene-KG and Path-KG, capturing the structural role of genes in biological networks;
Protein Sequence Encoding: Use the ESM-2 pre-trained model to encode protein sequences, capturing functional information at the amino acid level;
Cross-Modal Attention Fusion: Explicitly model the dependency between perturbed genes and target genes across different modalities through a cross-attention module.

This design achieves an organic integration of 'neural-symbolic' approaches, combining symbolic knowledge reasoning and neural network representation learning capabilities.

Section 04

Technical Architecture: Training Optimization Strategy

AROMA adopts a two-stage training strategy to optimize the model:

First Stage: Multimodal Supervised Fine-Tuning (SFT)

Perform multimodal supervised learning on the PerturbReason dataset to learn the basic mapping from input queries to perturbation effect predictions, ensuring the model masters basic biological knowledge and prediction capabilities.

Second Stage: GRPO Reinforcement Learning Optimization

Introduce Group Relative Policy Optimization (GRPO) for reinforcement learning fine-tuning. GRPO optimizes the policy through intra-group relative reward signals, avoiding the unstable training problem of the critic model in traditional PPO algorithms. This stage not only improves prediction accuracy but also guides the model to generate biologically meaningful and interpretable reasoning processes, achieving dual optimization of 'performance-interpretability'.

Section 05

Experimental Validation and Open-Source Contributions

AROMA is fine-tuned based on the Qwen3-8B base model, making full use of the language understanding and generation capabilities of open-source large language models. The research team has fully open-sourced the following on the Hugging Face platform:

Model Weights: blazerye/AROMA;
Reasoning Dataset: blazerye/PerturbReason (full version);
Knowledge Graphs: Complete versions of Gene-KG and Path-KG. The comprehensive open-source strategy lowers the threshold for reproduction and provides valuable infrastructure for the computational biology community.

Section 06

Technical Significance and Future Outlook

Technical Significance

AROMA's insights for the AI for Science field:

New Paradigm for Multimodal Fusion: Demonstrates the idea of unified modeling of text, graph structure, and sequence data, which can be extended to fields such as materials science and drug discovery;
Practical Path for Interpretable AI: Provides a feasible solution for interpretable prediction in scientific fields through explicit evidence retrieval and structured knowledge integration;
Application of Reinforcement Learning in Scientific Reasoning: The successful application of GRPO in biological reasoning tasks expands the application boundary of RLHF/RLAIF technologies in professional fields.

Future Outlook

With the popularization of single-cell sequencing technology and the development of spatial transcriptomics, virtual cell modeling is expected to integrate more refined cell state information. The AROMA architecture has good scalability and can further integrate emerging data modalities such as single-cell expression profiles and spatial location information, evolving towards the ultimate goal of 'digital twin cells'.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49