Zing Forum

Reading

Delhi Air Quality Prediction System: Guarding Respiratory Health with Machine Learning

A machine learning-based real-time air quality prediction web application that provides Delhi residents with accurate AQI forecasts and health recommendations through automated data pipelines and Streamlit visualization.

空气质量AQI预测机器学习StreamlitDockerMLOps环境监测随机森林特征工程自动化管道
Published 2026-05-09 18:56Recent activity 2026-05-09 18:59Estimated read 5 min
Delhi Air Quality Prediction System: Guarding Respiratory Health with Machine Learning
1

Section 01

Introduction: Core Overview of the Delhi Air Quality Prediction System

The Delhi Air Quality Prediction System is a machine learning-based real-time AQI prediction web application designed to address the pain point that Delhi residents lack convenient real-time air quality tools. The system acquires real-time pollutant data through an automated data pipeline, uses a Random Forest model to predict AQI, and provides functions such as health recommendations and historical trend analysis via Streamlit visualization. Technically, it adopts Docker containerization deployment and MLOps best practices, delivering practical value to residents and policymakers.

2

Section 02

Project Background: Real Pain Points of Delhi's Air Pollution

Delhi has long been among the world's most polluted cities, with AQI fluctuating drastically over time and across locations. However, residents lack real-time prediction tools. Traditional monitoring stations only provide historical data without prediction capabilities or personalized health guidance, exposing groups like outdoor workers and children to health risks. This project builds an end-to-end machine learning system to solve the information asymmetry problem.

3

Section 03

Core Features and Technical Highlights

Core Features: Real-time AQI prediction, health risk classification (6 levels + recommendations), historical pollutant trend visualization; Technical Highlights: Hourly automated data collection (GitHub Actions), Docker containerization deployment, model interpretability (feature importance display).

4

Section 04

Technical Architecture and Implementation Details

Data Layer: Integrates real-time APIs (OpenWeatherMap/WAQI) and historical CPCB data; GitHub Actions automatically collects and preprocesses data hourly. Feature Engineering: Core pollutants (PM2.5/PM10, etc.) + cyclically encoded time features; PM2.5/PM10 contribute over 60% to feature importance. Model Selection: Random Forest (R²=0.94) outperforms XGBoost/linear regression. Interface: Streamlit dashboard displays real-time AQI, trend charts, health recommendations, and model insights.

5

Section 05

Key Findings and Data Insights

  • Seasonality: Winter AQI is 3 times higher than summer (due to inversion layers + crop burning + fireworks); - Diurnal pattern: Pollution is most severe during morning and evening rush hours; - Dominant factors: PM2.5/PM10 are core drivers of AQI; - Automation: Hourly data collection ensures the model's real-time performance.
6

Section 06

Deployment and Usage Guide

Docker Compose: git clone ... && docker-compose up --build; Local Environment: pip install -r requirements.txt && streamlit run app.py; Makefile Commands: setup/run/docker/test; API keys in the .env file need to be configured (free to obtain from OpenWeatherMap/WAQI).

7

Section 07

Project Value and Social Significance

The system provides residents with real-time decision-making basis (e.g., whether to engage in outdoor activities) and pollution hotspot data for policymakers. Its open-source architecture can be referenced by other polluted cities, and the containerized design reduces maintenance costs, making it an example of technology empowering environmental protection.