# Delhi Air Quality Prediction System: Guarding Respiratory Health with Machine Learning

> A machine learning-based real-time air quality prediction web application that provides Delhi residents with accurate AQI forecasts and health recommendations through automated data pipelines and Streamlit visualization.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-09T10:56:08.000Z
- 最近活动: 2026-05-09T10:59:44.868Z
- 热度: 154.9
- 关键词: 空气质量, AQI预测, 机器学习, Streamlit, Docker, MLOps, 环境监测, 随机森林, 特征工程, 自动化管道
- 页面链接: https://www.zingnex.cn/en/forum/thread/geo-github-harshsharma5468-delhi-aqi-prediction
- Canonical: https://www.zingnex.cn/forum/thread/geo-github-harshsharma5468-delhi-aqi-prediction
- Markdown 来源: floors_fallback

---

## Introduction: Core Overview of the Delhi Air Quality Prediction System

The Delhi Air Quality Prediction System is a machine learning-based real-time AQI prediction web application designed to address the pain point that Delhi residents lack convenient real-time air quality tools. The system acquires real-time pollutant data through an automated data pipeline, uses a Random Forest model to predict AQI, and provides functions such as health recommendations and historical trend analysis via Streamlit visualization. Technically, it adopts Docker containerization deployment and MLOps best practices, delivering practical value to residents and policymakers.

## Project Background: Real Pain Points of Delhi's Air Pollution

Delhi has long been among the world's most polluted cities, with AQI fluctuating drastically over time and across locations. However, residents lack real-time prediction tools. Traditional monitoring stations only provide historical data without prediction capabilities or personalized health guidance, exposing groups like outdoor workers and children to health risks. This project builds an end-to-end machine learning system to solve the information asymmetry problem.

## Core Features and Technical Highlights

**Core Features**: Real-time AQI prediction, health risk classification (6 levels + recommendations), historical pollutant trend visualization; **Technical Highlights**: Hourly automated data collection (GitHub Actions), Docker containerization deployment, model interpretability (feature importance display).

## Technical Architecture and Implementation Details

**Data Layer**: Integrates real-time APIs (OpenWeatherMap/WAQI) and historical CPCB data; GitHub Actions automatically collects and preprocesses data hourly. **Feature Engineering**: Core pollutants (PM2.5/PM10, etc.) + cyclically encoded time features; PM2.5/PM10 contribute over 60% to feature importance. **Model Selection**: Random Forest (R²=0.94) outperforms XGBoost/linear regression. **Interface**: Streamlit dashboard displays real-time AQI, trend charts, health recommendations, and model insights.

## Key Findings and Data Insights

- Seasonality: Winter AQI is 3 times higher than summer (due to inversion layers + crop burning + fireworks); - Diurnal pattern: Pollution is most severe during morning and evening rush hours; - Dominant factors: PM2.5/PM10 are core drivers of AQI; - Automation: Hourly data collection ensures the model's real-time performance.

## Deployment and Usage Guide

**Docker Compose**: `git clone ... && docker-compose up --build`; **Local Environment**: `pip install -r requirements.txt && streamlit run app.py`; **Makefile Commands**: setup/run/docker/test; API keys in the .env file need to be configured (free to obtain from OpenWeatherMap/WAQI).

## Project Value and Social Significance

The system provides residents with real-time decision-making basis (e.g., whether to engage in outdoor activities) and pollution hotspot data for policymakers. Its open-source architecture can be referenced by other polluted cities, and the containerized design reduces maintenance costs, making it an example of technology empowering environmental protection.
