Zing Forum

Reading

PyTorch-based Intelligent Water Source Type Recognition System: From Data Synthesis to Deep Learning Classification

A neural network project built with PyTorch that automatically classifies four water source types (groundwater, rainwater, river/lake water, and seawater) by analyzing 30 water quality indicators. The project includes a complete training and inference workflow, as well as a synthetic dataset with over 100,000 samples.

PyTorch深度学习水源分类环境监测神经网络水质分析机器学习Python分类模型环境AI
Published 2026-05-25 23:15Recent activity 2026-05-25 23:19Estimated read 9 min
PyTorch-based Intelligent Water Source Type Recognition System: From Data Synthesis to Deep Learning Classification
1

Section 01

Project Introduction: PyTorch-based Intelligent Water Source Type Recognition System

Hello everyone! Today I'm introducing a PyTorch-based intelligent water source type recognition system project. Developed by Tanish3939 and open-sourced on GitHub (Project link: https://github.com/Tanish3939/Water-type-AI-prediction), this project aims to automatically classify four types of water sources—groundwater, rainwater, river/lake water, and seawater—by analyzing 30 water quality indicators. It includes a complete training and inference workflow, as well as a synthetic dataset with over 100,000 samples, providing a reference example for AI applications in environmental monitoring.

2

Section 02

Project Background and Significance

Project Background

Water resource monitoring is a crucial part of environmental protection and public health. Traditional water source type identification relies on laboratory analysis and expert experience, which is time-consuming and costly. With the development of AI technology, it has become possible to automatically identify water source types from multi-dimensional water quality data using machine learning.

Project Significance

This project is a PyTorch-based deep learning practice that demonstrates the complete workflow from data preparation to model deployment, providing a reusable technical framework for AI applications in environmental science.

3

Section 03

Dataset Composition and Feature Engineering

Synthetic Dataset

The project uses a synthetic dataset with over 100,000 samples, covering four water source types: groundwater, rainwater, river/lake water, and seawater.

30 Water Quality Feature Indicators

The model input includes multi-dimensional parameters:

  • Basic Physicochemical Indicators: pH value, TDS, conductivity, turbidity, temperature, hardness, chloride, sulfate, nitrate, dissolved oxygen, BOD, TOC
  • Ions and Trace Elements: Sodium, iron, calcium, magnesium, potassium, ammonia, phosphate, alkalinity, silica, fluoride, manganese, arsenic
  • Isotope Characteristics: δ¹⁸O, δD, tritium, δ¹⁵N-nitrate
  • Microbial and Ratio Indicators: Coliform count, sodium-chloride ratio

These features can capture the chemical fingerprint differences between different water sources.

4

Section 04

Neural Network Architecture Design

Model Structure

A four-layer fully connected neural network is used: Input layer (30 dimensions) → Hidden layer 1 (128 neurons) → Hidden layer 2 (64 neurons) → Hidden layer 3 (32 neurons) → Output layer (4 classes).

Key Design

  • Activation Function: LeakyReLU (negative slope 0.05) to alleviate the dying neuron problem
  • Regularization: Dropout (dropout rate 0.2) after the first two hidden layers to prevent overfitting
  • Output Layer: Softmax activation to output the probability distribution of four classes
  • Loss Function: Cross-entropy loss (CrossEntropyLoss)
  • Optimizer: SGD (learning rate 0.01, momentum 0.9, weight decay 0.0001)
5

Section 05

Training Workflow and Data Preprocessing

Data Preprocessing

  • Data Split: 80% training set, 10% validation set, 10% test set
  • Standardization: Z-score normalization (x_normalized = (x - mean)/std) to unify feature scales

Training Hyperparameters

  • Batch size: 256
  • Training epochs: 1048
  • Optimizer: SGD with momentum
  • Learning rate: 0.01, weight decay: 0.0001

Training Monitoring

Output training loss, validation loss, and validation accuracy every epoch to detect overfitting in time.

6

Section 06

Model Inference and Deployment

Inference Workflow

The project provides the predict_water_source.py script, supporting batch prediction:

  1. Load pre-trained model and standardization parameters
  2. Perform Z-score normalization on new data
  3. Forward propagation to get probability distribution
  4. Output the predicted water source type

Model Persistence

The trained model is saved in .pth format, including:

  • Network weights and structure
  • Mean and standard deviation of training data
  • Class name mapping

Ensure that the preprocessing during inference is consistent with training to avoid prediction bias.

7

Section 07

Technical Highlights and Application Scenarios

Technical Highlights

  1. Complete Workflow: Covers the entire process from data loading, preprocessing, model building, training to inference
  2. Professional Feature Engineering: 30-dimensional features combined with environmental chemistry knowledge, especially isotope indicators for water source tracing
  3. Engineering Details: Dynamic path finding, data separation, model metadata persistence
  4. Synthetic Data Application: Solves the problem of difficult access to real data, demonstrating the potential of data simulation in environmental AI

Application Scenarios

  • Rapid field water quality monitoring
  • Pollution source tracing analysis
  • Water resource management (estimation of mixing ratio between groundwater and surface water)
  • Water source safety assessment after natural disasters
8

Section 08

Summary and Expansion Suggestions

Project Summary

This project is a typical case of deep learning application in environmental science, realizing an end-to-end water source classification system. Its value lies not only in technical implementation but also in providing a reusable framework (synthetic data generation, feature engineering, model design, etc.), offering references for AI developers and researchers in environmental monitoring.

Expansion Suggestions

  1. Model Lightweight: Adopt knowledge distillation or pruning to adapt to edge devices
  2. Uncertainty Quantification: Introduce Bayesian neural networks or ensemble learning to output confidence
  3. Temporal Modeling: Combine LSTM/Transformer to handle dynamic changes in water quality
  4. Transfer Learning: Use pre-trained models to adapt to water quality characteristics of specific regions