Zing Forum

Reading

Machine Learning Model Drift Monitoring System in Production: From Theory to Practice

A complete MLOps graduation project demonstrating how to build an end-to-end model monitoring system, detect data drift using PSI and KS tests, and issue early warnings before model performance degrades.

MLOps模型监控数据漂移概念漂移PSIKS检验Evidently AIMLflowAirflow欺诈检测
Published 2026-05-28 23:16Recent activity 2026-05-28 23:18Estimated read 6 min
Machine Learning Model Drift Monitoring System in Production: From Theory to Practice
1

Section 01

Model Drift Monitoring System in Production: Core Value and Project Overview

This article introduces a complete MLOps graduation project focusing on building a machine learning model drift monitoring system in production. Targeting imbalanced classification scenarios like credit card fraud detection, the project uses PSI and KS tests to detect data drift and achieve early warnings before model performance degrades. The tech stack includes Evidently AI (drift reports), MLflow (experiment tracking), Apache Airflow (pipeline orchestration), Streamlit (real-time dashboard), etc., verifying the effectiveness of distribution monitoring as an early warning signal.

2

Section 02

Necessity of Model Monitoring and Project Background

Machine learning models in production are prone to failure due to data/concept drift, but many teams lack effective monitoring after deployment. This project targets credit card fraud detection (fraudulent transactions account for only 0.17%) and builds an end-to-end monitoring framework. It simulates 12 weeks of production operation and injects controlled drift to verify that the system can detect issues early. The tech stack covers Evidently AI, MLflow, Airflow, Streamlit, and XGBoost models.

3

Section 03

Technical Implementation and Monitoring Pipeline Design

Core Detection Methods:

  • PSI (Population Stability Index): Measures distribution differences with thresholds of 0.10 (warning) and 0.20 (critical);
  • KS Test: Compares cumulative distribution functions to detect changes in distribution shape.

Monitoring Pipeline:

  1. Training Phase: Train the XGBoost model, log to MLflow, and save the reference distribution;
  2. Production Simulation: Generate 12 weeks of data, inject drift in stages (no drift → gradual → critical);
  3. Monitoring Execution: Generate Evidently reports, calculate PSI/KS, trigger alerts, and log to MLflow;
  4. Alert Management: Tiered response (warning → close monitoring, critical → immediate investigation);
  5. Visualization: Streamlit dashboard displays alert status, trends, and reports.
4

Section 04

Experimental Data: Effectiveness of PSI as a Leading Indicator

Experimental results show:

Week F1 Score PR-AUC Max PSI Status
1-4 0.97-1.00 0.995-1.000 0.003-0.005 Normal
5 0.937 1.000 0.106 Warning
6 0.921 0.882 0.201 Critical
12 0.929 0.938 2.958 Critical

Key insight: PSI triggered a warning in Week5 (0.106), but the F1 score remained above 0.91 within 12 weeks. This indicates that in imbalanced classification, input drift does not immediately reflect in performance metrics, making PSI a more timely early warning signal.

5

Section 05

Key Insights and Best Practice Summary

  1. Input Distribution Monitoring is Indispensable: Performance metrics lag in imbalanced classification; distribution monitoring is the only reliable early warning;
  2. PSI is a Leading Indicator: Detects issues ahead of business metrics;
  3. Tiered Response Strategy: Close monitoring for warnings, immediate action for critical alerts to avoid unnecessary model updates;
  4. Complete Observability: Combine metrics (MLflow), logs, tracking (Airflow), reports (Evidently), and dashboards to ensure system visibility.
6

Section 06

Production Deployment Improvements and Alert Handling Guidelines

Production Deployment Recommendations:

  • Data Ingestion: Pull data from data warehouses/stream receivers;
  • Alert Delivery: Integrate with Slack/PagerDuty;
  • Authentication: Add SSO;
  • Multi-Model Support: Model registry;
  • Reference Data Refresh: Automated versioning;
  • Testing & CI: Unit test coverage and CI processes.

Alert Handling Process:

  • Warning: Check Evidently reports, confirm drift causes, and monitor continuously for 2 weeks;
  • Critical: Confirm data pipeline normalcy, investigate drift type, and retrain immediately if performance drops below threshold;
  • Retraining: Train with the latest 90 days of data; deploy only models with improved performance after evaluation.