Zing Forum

Reading

Serverless AQI Prediction System: A Complete MLOps Practice from Feature Engineering to Automated Model Retraining

An end-to-end serverless machine learning pipeline that predicts the Air Quality Index (AQI) for the next 3 days, integrating GitHub Actions for automatic retraining, Hopsworks feature store, and a production-grade dashboard.

AQI空气质量机器学习MLOps无服务器GitHub ActionsHopsworks特征工程时序预测
Published 2026-05-31 03:15Recent activity 2026-05-31 03:20Estimated read 5 min
Serverless AQI Prediction System: A Complete MLOps Practice from Feature Engineering to Automated Model Retraining
1

Section 01

Introduction: Core MLOps Practices of the Serverless AQI Prediction System

This project is an end-to-end serverless machine learning pipeline that predicts the Air Quality Index (AQI) for the next 3 days. It integrates GitHub Actions for automatic retraining, Hopsworks feature store, and a production-grade dashboard to address the lag issue in traditional AQI predictions, provide early forecasts for public health protection, and serve as an excellent case study for MLOps practices.

2

Section 02

Background: Why Do We Need an Automated AQI Prediction System?

The Air Quality Index (AQI) affects daily decisions (e.g., outdoor activities, health protection), but traditional releases have lag issues. ML-based predictions can forecast several days in advance, but building a production-grade system faces challenges like multi-source data processing, feature engineering, and maintaining model timeliness. This project enables individual developers to deploy enterprise-level prediction services through serverless architecture and MLOps automation workflows.

3

Section 03

Data Collection and Automated Feature Engineering

AQI prediction requires integrating pollutant concentrations (PM2.5, PM10, NO2, ozone, sulfur dioxide, carbon monoxide, etc.) and meteorological data (temperature, humidity, wind speed, air pressure, etc.). The project implements hourly automated feature engineering, including data cleaning, missing value imputation, feature scaling, and time-series feature construction, to ensure the data input to the model is up-to-date and complete.

4

Section 04

Model Training and Automatic Retraining Mechanism

Air quality data has time-series characteristics; seasonal changes and pollution source variations can affect model performance. The project uses GitHub Actions to implement daily automatic retraining, learning from the latest data to avoid model aging and maintain prediction accuracy.

5

Section 05

Key Role of Hopsworks Feature Store

The project uses Hopsworks feature store to address core ML pain points: feature consistency (same logic for training/inference), feature reuse, feature lineage tracking, time travel (accessing feature states at any historical time point), decoupling training and inference, and improving system maintainability.

6

Section 06

Advantages of Serverless Architecture

Serverless architecture offers three major benefits: cost optimization (billed by actual computing time, suitable for event-driven scenarios), automatic scaling (auto-scales when traffic surges), and simplified operations (no need to manage servers, focus on business logic).

7

Section 07

Production-Grade Dashboard Design and User Value

The project provides an intuitive dashboard with real-time AQI values, next 3-day trends, pollutant breakdowns, and health advice prompts, transforming complex prediction results into user-friendly visualizations to support public health decisions.

8

Section 08

Summary and Reusable Practices

This project is an excellent MLOps case study that demonstrates the transition from lab prototype to production system. Reusable practices include automation-first, feature as code, monitoring and observability (recommended), and progressive deployment. The architecture is general-purpose and can be extended to prediction scenarios like pollen concentration and UV index, making it an excellent reference for MLOps learners.