Zing Forum

Reading

CODRUG: A Visualized QSAR Machine Learning Analysis Tool for Drug R&D

This article introduces CODRUG, a QSAR analysis graphical tool developed based on PyQt5. It integrates a complete workflow including molecular descriptor generation, feature engineering, model construction, and validation, providing a code-free machine learning solution for medicinal chemistry researchers.

QSAR药物研发机器学习分子描述符PyQt5化学信息学RDKit生物活性预测
Published 2026-06-11 10:45Recent activity 2026-06-11 10:51Estimated read 7 min
CODRUG: A Visualized QSAR Machine Learning Analysis Tool for Drug R&D
1

Section 01

Introduction: CODRUG—A Code-Free QSAR Machine Learning Visualization Tool for Drug R&D

This article introduces CODRUG, a QSAR analysis graphical tool developed based on PyQt5. It integrates a complete workflow including molecular descriptor generation, feature engineering, model construction, and validation, providing a code-free machine learning solution for medicinal chemistry researchers. Maintained by Moisés Maia, this open-source tool has been registered as a computer program with the National Institute of Industrial Property (INPI) of Brazil. Its goal is to simplify the complex process of traditional QSAR modeling and lower the barrier to applying machine learning in drug R&D.

2

Section 02

Background: The Importance of QSAR Analysis and Pain Points of Traditional Modeling

Quantitative Structure-Activity Relationship (QSAR) analysis is a core method for predicting the biological activity of compounds. By modeling structure-activity relationships, it can narrow the scope of experimental screening and reduce R&D costs. However, the traditional QSAR workflow involves multiple steps such as data collection, descriptor calculation, and feature selection, requiring professional software and statistical knowledge. This high barrier limits the popularization of machine learning. CODRUG is designed to simplify this process, allowing researchers to focus on scientific problems rather than technical details.

3

Section 03

Core Functional Modules of CODRUG

CODRUG provides end-to-end QSAR analysis capabilities, with main modules including:

  1. Dataset preparation and preprocessing: Supports ChEMBL import, automatic standardization and cleaning, and handles missing values/outliers;
  2. Molecular descriptor generation and feature engineering: Integrates RDKit/PaDEL-Descriptor to calculate hundreds of descriptors, supporting feature selection/dimensionality reduction;
  3. Model construction and validation: Built-in regression, classification, and clustering algorithms, with hyperparameter optimization and cross-validation implemented via PyCaret/Scikit-learn;
  4. External database prediction: Applies models to external compound libraries, supporting virtual screening and lead compound optimization.
4

Section 04

Technical Architecture and Key Dependencies

CODRUG is developed based on Python 3.10.12 and PyQt5/Qt5.15.14. Its tech stack includes:

  • GUI framework: PyQt5;
  • Cheminformatics: RDKit 2024.03.5;
  • Data processing: Pandas, NumPy;
  • Machine learning: Scikit-learn, PyCaret;
  • Deep learning: TensorFlow, PyTorch;
  • Visualization: Matplotlib, Seaborn;
  • Data acquisition: ChEMBL Web Client;
  • Descriptor calculation: PaDELpy.
5

Section 05

Installation Guide and Platform Compatibility Notes

Installation is simple: download the source code, unzip it, and run the main program. The first launch automatically creates a virtual environment and checks dependencies. The interface uses a tabbed layout organized according to the QSAR workflow (data import → preprocessing → descriptors → model → validation → prediction), making it easy for beginners to use. Currently, it has only been fully tested on Linux Mint 21.3 (CUDA 12.4). Ubuntu derivatives are theoretically supported, while Windows/macOS require additional adaptation or the use of a virtual machine.

6

Section 06

Application Value and Target User Groups

Target users include medicinal chemistry researchers (without programming background), computational chemistry students (as a teaching tool), small teams (as an integrated platform), and industrial R&D departments (for rapid prototyping). The tool's value lies in encapsulating complex processes, allowing researchers to focus on scientific hypotheses. The open-source GPL license ensures academic freedom and community contributions.

7

Section 07

Future Development Directions: Cross-Platform Support and Feature Enhancement

CODRUG's future expansion plans include:

  1. Cross-platform support: Add official support for Windows/macOS;
  2. Cloud deployment: Develop a web version to support collaboration;
  3. Deep learning integration: Enhance support for Graph Neural Networks (GNN);
  4. Automated workflow: Introduce AutoML to lower the barrier;
  5. Model interpretation: Add interpretability features to understand structure-activity relationships.
8

Section 08

Conclusion: Tool Democratization Empowers Drug R&D

CODRUG is an important attempt at democratizing drug R&D tools. By encapsulating professional QSAR capabilities in a user-friendly interface, it lowers the barrier to applying machine learning, allowing more researchers to benefit from computational methods. For professionals in drug design and computational chemistry, it is an open-source tool worth paying attention to.