# AareML: Cutting-Edge Practice of Predicting Water Quality in Swiss River Basins Using Deep Learning

> Explore how the advanced machine learning course project at the University of Bern in Switzerland uses LSTM neural networks to predict river dissolved oxygen and water temperature, and achieves breakthrough experiments in cross-continental transfer learning.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-05T22:15:43.000Z
- 最近活动: 2026-06-05T22:18:35.928Z
- 热度: 154.9
- 关键词: LSTM, time series forecasting, water quality, dissolved oxygen, transfer learning, environmental AI, Switzerland, deep learning, SHAP, explainable AI
- 页面链接: https://www.zingnex.cn/en/forum/thread/aareml
- Canonical: https://www.zingnex.cn/forum/thread/aareml
- Markdown 来源: floors_fallback

---

## Introduction: Core Overview of the AareML Project

AareML is an advanced machine learning course project at the University of Bern in Switzerland. It uses LSTM neural networks to predict river dissolved oxygen (DO) and water temperature, explores cross-continental transfer learning capabilities, and emphasizes model interpretability (using SHAP). The project aims to address the limitations of traditional water quality prediction methods and demonstrate the application potential of deep learning in environmental science.

## Project Background and Research Motivation

Water quality prediction is of great significance for environmental protection, ecological management, and public health. Although Switzerland has a strict monitoring system, traditional methods struggle to capture complex time-series patterns. Thus, the AareML project was born, attempting to use deep learning to predict DO and water temperature, and explore the model's transfer capability across geographical regions. This project is a graduation project for the CAS Advanced Machine Learning course at the University of Bern, completed in June 2026, focusing on prediction accuracy, interpretability, and cross-domain generalization capabilities.

## Dataset and Experimental Design

**Dataset**: The core is CAMELS-CH-Chem (Swiss River Basin Chemical Indicator Dataset, released in 2025), which contains time-series data of DO and water temperature from multiple stations; the US LakeBeD-US dataset (released in 2025) is introduced to verify transfer capability.

**Experimental Design**: A 14-day prediction window is used, with 21 days of historical observation data as input, which meets actual early warning needs and provides sufficient context.

## Model Architecture and Technical Implementation

**Model Architecture**: Sequence-to-sequence LSTM, which solves the gradient vanishing problem of traditional RNNs and is suitable for capturing long-term dependencies.

**Optimization Strategy**: Optuna hyperparameter optimization (75 trials) is used for single-station prediction, and 3 seed ensemble strategies (averaging models trained with different random seeds) are employed to improve accuracy and stability.

**Multi-Station Prediction Strategies**: Zero-shot transfer, per-station retraining, and EA-LSTM (using static features to guide learning of station differences).

## Core Experimental Results and Findings

- **DO Prediction for Swiss Rivers**: The optimal LSTM model achieves an RMSE of 0.300 mg/L and KGE of 0.936, outperforming traditional ridge regression (RMSE 0.303, KGE 0.908).
- **Water Temperature Prediction**: After introducing static features into EA-LSTM, the average RMSE decreases from 2.59°C to 1.721°C, and NSE increases from 0.730 to 0.862 (34% improvement).
- **Cross-Continental Transfer**: The Swiss river model shows reasonable performance when transferred to US rivers (RMSE 0.996-1.598 mg/L); zero-shot transfer to lakes fails (RMSE 3.980 mg/L), but retraining surpasses the baseline (RMSE 0.768 mg/L, NSE 0.700).

## Interpretability Analysis: Scientific Findings from SHAP

Analysis using GradientSHAP shows:
- The water temperature at the previous time step (temp_sensor[t-1]) is the most important factor (average absolute SHAP value of 0.644, consistent with Henry's Law);
- The DO concentration at the previous time step (O2C_sensor[t-1]) is the next most important (SHAP value of 0.527);
- The effective memory length of LSTM is only 3-4 days. Ablation experiments verify that shortening the input window to 6 days results in a minimal change in RMSE (0.304→0.308), reducing computational costs and guiding data collection strategies.

## Practical Application Value and Future Outlook

**Application Value**: Create a DO stress map for the canton of Zurich to help environmental agencies optimize monitoring resource allocation.

**Future Directions**: Expand to more water quality indicators; explore GNN for modeling river spatial topology; develop real-time early warning systems; establish cross-national data sharing mechanisms to enhance generalization capabilities.

AareML demonstrates that the combination of rigorous science and machine learning can better understand and protect the aquatic environment.
