Zing Forum

Reading

Predicting Dengue Outbreaks with Machine Learning: An Exploration of Climate and Environmental Data-Driven Public Health Early Warning

A machine learning project that predicts dengue outbreaks using climate and environmental data. By analyzing multi-dimensional features such as temperature, precipitation, and vegetation index, it provides data-driven early warning capabilities for infectious disease prevention and control in tropical regions.

登革热预测机器学习气候数据公共卫生传染病防控时间序列分析环境监测流行病学
Published 2026-05-14 03:56Recent activity 2026-05-14 04:07Estimated read 6 min
Predicting Dengue Outbreaks with Machine Learning: An Exploration of Climate and Environmental Data-Driven Public Health Early Warning
1

Section 01

[Introduction] Machine Learning-Driven Dengue Outbreak Early Warning: An Exploration of Public Health Applications of Climate and Environmental Data

This article focuses on using machine learning to predict dengue outbreaks. Based on climate (temperature, precipitation, etc.) and environmental (NDVI) data, it explores data-driven public health early warning capabilities through the DengAI competition project. The goal is to predict weekly dengue cases in San Juan and Iquitos, supporting the shift from passive response to proactive prevention and control of infectious diseases in tropical regions.

2

Section 02

Project Background: DengAI Competition and Multi-Source Climate & Environmental Dataset

This project originates from the DengAI competition on the DrivenData platform, aiming to predict weekly dengue cases in San Juan (Puerto Rico) and Iquitos (Peru). The competition provides years of weekly observation data, including temperature (max/min/average), precipitation, relative humidity, Normalized Difference Vegetation Index (NDVI), etc., sourced from NOAA and satellite remote sensing systems. The project's code repository includes the complete analysis workflow (exploratory analysis, prediction submission), technical report, and demonstration video.

3

Section 03

Technical Approach: Key Steps in Feature Engineering and Model Selection

Dengue prediction is a time-series regression problem, with core steps being feature engineering and model selection. For feature engineering: preprocess raw climate data (missing value imputation considering time-series characteristics, constructing lag features to reflect the lag effect of climate factors). For model selection: combine gradient boosting trees (e.g., XGBoost) with time-series models (e.g., ARIMA, Prophet) to leverage the former's strength in handling interaction relationships in tabular data and the latter's ability to capture periodic trends.

4

Section 04

Biological Mechanisms Linking Climate Variables to Dengue Transmission

There are clear biological mechanisms linking climate variables to dengue transmission: Temperature (25-29°C is the optimal transmission window, affecting mosquito reproduction and virus replication); Precipitation (moderate rainfall creates breeding grounds, while excessive rainfall washes away larvae); NDVI (reflects moisture and ecological environment, supplementing ground meteorological data). These non-linear relationships form the theoretical basis for model construction.

5

Section 05

Two-City Comparison: Exploration of Prediction Adaptability Under Different Climate Patterns

The climate differences between San Juan (Caribbean tropical marine climate with obvious seasonal peaks) and Iquitos (Amazon rainforest climate with high baseline cases and weak seasonality) pose challenges to model design: Can a unified model adapt to both patterns? Is city-specific training necessary? How does city identifier encoding affect accuracy? These explorations provide methodological references for model generalization.

6

Section 06

Public Health Applications: Path from Prediction Model to Prevention & Control Decision-Making

Prediction models can support prevention and control decisions at multiple levels: Strategic level (resource allocation, material reserve); Tactical level (guiding key mosquito control areas); Public communication level (enhancing the persuasiveness of protective behaviors). However, transitioning from a competition model to actual deployment requires addressing issues such as real-time data acquisition, model updates, uncertainty communication, and integration with existing systems.

7

Section 07

Conclusion: The Future of Data Science Empowering Global Mosquito-Borne Disease Prevention and Control

Against the backdrop of climate change exacerbating infectious disease risks, the methodology of multi-source data integration, feature engineering, and model optimization demonstrated in this project has transfer value for predicting mosquito-borne diseases such as malaria and Zika. The complete workflow and code provided by the project serve as a starting point for subsequent researchers, and also indicate that the value of massive data in the public health field remains to be unlocked to address global health challenges.