# Deep Reinforcement Learning + Graph Neural Networks: A New Paradigm for Antibiotic Discovery from Johns Hopkins University

> A final project from Johns Hopkins University's machine learning course combines GATv2 graph neural networks with Proximal Policy Optimization (PPO) reinforcement learning to discover new antibiotic candidate molecules targeting Staphylococcus aureus and Escherichia coli. The system outperforms traditional baseline methods on multiple metrics, generating 20,031 unique and effective molecules, providing a reproducible technical path for AI-driven drug discovery.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-28T00:46:17.000Z
- 最近活动: 2026-05-28T00:51:12.253Z
- 热度: 159.9
- 关键词: 抗生素发现, 图神经网络, 强化学习, PPO, GATv2, 药物发现, 分子生成, JHU
- 页面链接: https://www.zingnex.cn/en/forum/thread/geo-github-jsf3467v-antibiotic-discovery
- Canonical: https://www.zingnex.cn/forum/thread/geo-github-jsf3467v-antibiotic-discovery
- Markdown 来源: floors_fallback

---

## Introduction: JHU Final Project Explores New Paradigm for Antibiotic Discovery Using GATv2 + PPO

A final project from Johns Hopkins University's AI master's program combines GATv2 graph neural networks with Proximal Policy Optimization (PPO) reinforcement learning to discover new antibiotic candidate molecules targeting Staphylococcus aureus and Escherichia coli. The system outperforms traditional baseline methods on multiple metrics, generating 20,031 unique and effective molecules, providing a reproducible technical path for AI-driven drug discovery.

## Background: Urgent Challenge of Antibiotic Resistance and the Need for AI Intervention

The World Health Organization has listed antibiotic resistance as one of the top ten threats to global public health. The traditional drug discovery process is time-consuming and costly, and the intervention of AI technology is changing this situation. Students from Johns Hopkins University's AI master's program explored a technical path combining Graph Neural Networks (GNN) and Reinforcement Learning (RL), aiming to automatically generate new molecular structures with antibiotic potential.

## Technical Architecture: Multi-task GATv2 + Three-stage PPO Reinforcement Learning

### Multi-task GATv2 Encoder
A three-layer GATv2 structure (128-dimensional hidden layer, 4 attention heads, average + max pooling) is used to predict the Minimum Inhibitory Concentration (MIC) for Staphylococcus aureus and Escherichia coli, with test set AUROC scores of 0.83 and 0.87 respectively (E. coli is close to the empirical noise ceiling).

### Three-stage PPO Agent
The policy network is based on GATv2, with autoregressive type/anchor/target heads, behavior cloned from active antibiotic trajectories, and KL divergence anchored to BC prior. Three-stage curriculum learning: Structure exploration (KL=1.0) → Size climbing (KL=0.5, target 25-30 heavy atoms) → Top-100 gated expansion.

### Alternative Fingerprint MLP Decoupled Reward
An alternative fingerprint MLP is used instead of the full GNN for reward scoring to improve training efficiency, with approximately 20,000 training episodes across 32 parallel environments.

## Dataset and Evaluation Framework: Multi-source Data Integration and Composite Reward Design

### Data Sources
Integrates ChEMBL 33 (78,314 compound-organism observation data), DrugBank 5.x (458 antibiotic SMILES), and CARD (457 substrate SMILES), using an 80/10/10 scaffold split to ensure chemical diversity differences.

### Composite Reward Function
Consists of efficacy signal (GNN-predicted MIC), QED (drug-likeness), synthetic accessibility (SA), novelty score (vs DrugBank), and resistance avoidance score (vs CARD).

## Experimental Results: Molecular Generation Performance Exceeding Traditional Baselines

- Generated 20,031 unique and effective molecules, outperforming random construction, hill-climbing algorithm, and SMILES-RNN baselines under Bonferroni-corrected p<1e-5, with Cliff's delta effect sizes of 0.97, 0.73, and 0.05 respectively.
- Chemical novelty: GA exhibited mode collapse (converged to a single scaffold), RL-generated molecules had a scaffold diversity of 0.003, and the lowest Fréchet ChemNet distance (26.1 vs average 43.8).
- Baseline comparison:
| Method               | Scaffold Diversity | Novelty Advantage | Chemical Distance |
|----------------------|-------------------|-------------------|-------------------|
| Random Construction  | Low               | None              | High              |
| Genetic Algorithm (GA)| Extremely Low (mode collapse) | Limited | Medium |
| Hill-climbing Algorithm | Low | Limited | Medium |
| SMILES-RNN           | Medium            | Sample size-driven | Medium |
| **RL (This Project)** | **0.003** | **Significant** | **26.1 (Optimal)** |

## Limitations and Practical Significance: Current Shortcomings and Value of Reproducible Path

### Limitations
1. Over 95% of generated molecules trigger Brenk structural alerts and require refinement by medicinal chemists; 2. No directly synthesizable lead compounds were generated; 3. There is soft cross-task scaffold leakage in training data; 4. Consistency bias between the surrogate and GNN (Pearson r=0.52, binary consistency 63%).

### Practical Significance
Demonstrates the effectiveness of the GNN+RL combination, provides a complete reproducible workflow (from data preprocessing to evaluation), open-sources training checkpoints (30MB) and code, and provides a benchmark for subsequent research.

## Project Structure and Usage: Modular Design and Reproducible Workflow

Core code is located in the `src/` directory: `gnn.py` (GATv2 regressor), `rl.py` (MDP environment and PPO trainer), `rewards.py` (composite reward), `feature_engineering.py` (graph features), training scripts (`train_gnn.py`/`train_rl.py`), evaluation scripts (`evaluate.py`, etc.). Reproducible workflow: Data extraction → GNN training (≈2 hours) → RL training (6-8 hours) → Baseline comparison (1 hour), which can be completed on standard hardware.

## Summary and Insights: Potential and Value of AI-Assisted Drug Discovery

This project is a typical application of AI in drug discovery: as a screening/generation tool to accelerate early discovery, rather than replacing traditional medicinal chemistry. For ML practitioners: it demonstrates the combination of GNN and RL to solve scientific problems, application of curriculum learning, and surrogate models to accelerate training; For the drug discovery field: it provides a scalable framework, and although there is a gap from clinical candidate molecules, it shows the great potential of AI-assisted antibiotic discovery. The full paper PDF is available on Hugging Face: <https://huggingface.co/jsf3467v/antibiotic-discovery/blob/main/paper.pdf>
