Zing Forum

Reading

Predicting Molecular Properties with Graph Neural Networks: An End-to-End Platform from SMILES to Solubility

A complete molecular property prediction platform that represents molecules as graph structures, compares three architectures (GCN, GraphSAGE, and GIN), and integrates explainable AI and REST API deployment.

图神经网络分子性质预测GNN药物发现可解释AIPyTorch GeometricGNNExplainerSMILES溶解度预测机器学习
Published 2026-06-13 05:13Recent activity 2026-06-13 05:21Estimated read 5 min
Predicting Molecular Properties with Graph Neural Networks: An End-to-End Platform from SMILES to Solubility
1

Section 01

Introduction: End-to-End Platform for Predicting Molecular Solubility with Graph Neural Networks

This open-source project provides a complete end-to-end platform that represents molecules as graph structures, compares three GNN architectures (GCN, GraphSAGE, GIN) for predicting water solubility (a key property in drug development), integrates explainable AI (GNNExplainer) and production-grade deployment (FastAPI + React), and solves the problem of traditional machine learning handling molecular topological structures.

2

Section 02

Background: Why Molecules Need to Be Represented as Graph Structures

Core challenge of traditional machine learning in handling molecular data: molecules are non-tabular data, and their topological structures (atom connection patterns, rings, branches) determine chemical properties. GNN models atoms as nodes and chemical bonds as edges, preserving topological structures while learning properties. This project focuses on water solubility prediction (40% of drug candidates fail due to solubility issues).

3

Section 03

Methodology: Project Architecture and Comparison of Three GNN Models

Workflow: SMILES string → RDKit parsing → Graph construction → GNN model → Prediction → GNNExplainer interpretation.

Three GNN Architectures:

  1. GCN: Aggregates neighbor node features to update its own representation;
  2. GraphSAGE: Samples neighbors and learns aggregation functions (mean/LSTM/pooling);
  3. GIN: Based on graph isomorphism testing theory, its expressive power is equivalent to the Weisfeiler-Lehman algorithm, capturing subtle structural differences.
4

Section 04

Evidence: Overwhelming Advantage of GIN Model on ESOL Dataset

ESOL dataset (1128 molecules) test results:

Model MAE RMSE
GCN 1.4526 1.8407
GraphSAGE 1.4160 1.7666
GIN 0.6876 0.8566

GIN's error is less than half of other models, so it is selected as the main production model.

5

Section 05

Explainable AI: GNNExplainer Makes Predictions Transparent

Integrating GNNExplainer provides:

  1. Output of water solubility logarithm values;
  2. Marking of key atoms;
  3. Heatmaps showing atom importance;
  4. Highlighting of key substructures (e.g., hydroxyl groups increase solubility, hydrophobic carbon chains decrease it). This helps understand the model and provides chemical insights, suitable for high-risk fields.
6

Section 06

Production Deployment: FastAPI + React Full-Stack Solution

Backend API (FastAPI):

  • GET /health: Health check;
  • POST /predict: Input SMILES to return solubility;
  • POST /visualize: Generate 2D molecular structure;
  • POST /explain: Return prediction and explanation visualization;
  • POST /analyze: Comprehensive endpoint.

Frontend Interface (React + Vite):Supports SMILES input for prediction, structure viewing, explanation graphs, and browsing benchmark results.

7

Section 07

Application Scenarios and Future Development Directions

Applications:

  • Drug discovery: Screen molecules with solubility issues to save costs;
  • Materials science: Extend to predictions of toxicity, bioavailability, etc.

Future Directions:

  • Real-time molecular hand-drawing interface;
  • Expansion to more datasets;
  • Hyperparameter optimization;
  • Docker cloud deployment;
  • Model monitoring and analysis.

Conclusion: This project changes the paradigm of molecular science, provides a toolchain for AI + chemistry, and choosing a model adapted to the data structure (e.g., GIN) is key.