# MLOps in Practice: Building a Scalable Multi-Class Financial Fraud Detection System

> A financial fraud detection project based on modern MLOps practices, using multi-class classification to handle transaction risks, integrating DVC version control, SMOTE sampling, and XGBoost model, achieving an ROC-AUC of 0.96 on a synthetic credit card dataset.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-18T22:44:58.000Z
- 最近活动: 2026-05-18T22:49:02.800Z
- 热度: 143.9
- 关键词: MLOps, fraud detection, XGBoost, SMOTE, DVC, financial risk, multi-class classification, SHAP, credit card fraud
- 页面链接: https://www.zingnex.cn/en/forum/thread/mlops-d095e500
- Canonical: https://www.zingnex.cn/forum/thread/mlops-d095e500
- Markdown 来源: floors_fallback

---

## [Introduction] MLOps in Practice: Key Points of Building a Scalable Multi-Class Financial Fraud Detection System

This project is a financial fraud detection project based on modern MLOps practices. It uses multi-class classification to categorize transactions into four risk levels (TT: Completely Normal, TF: Suspicious but Normal, FT: Low-Impact Fraud, FF: High-Impact Fraud). It integrates DVC version control, SMOTE sampling, and XGBoost model, achieving an ROC-AUC of 0.96 on a synthetic credit card dataset, providing financial institutions with more refined risk assessment capabilities.

## Project Background and Motivation: Addressing Core Challenges in Financial Fraud Detection

Financial fraud detection faces the problem of extremely imbalanced data (fraud accounts for only about 1%). Traditional binary classification methods lose risk gradient information and cannot distinguish transactions of different impact levels. This project is led by a graduate student team from DePaul University, aiming to build a reproducible and scalable MLOps workflow for fine-grained risk stratification of transactions, using a synthetic credit card transaction dataset with 43 features.

## Technical Architecture and MLOps Practices: Modular Design and Key Components

The project adopts a modular src architecture, with core components including:
1. Data Engineering: Preprocessing pipeline (category encoding, train-test split), behavioral feature engineering (rolling window statistics, geographic distance, time features, etc.);
2. Model Training: Comparing Logistic Regression, Random Forest, LightGBM, XGBoost, using SMOTE oversampling (strategy 0.3) to handle imbalance;
3. DVC Version Control: Managing data and model versions, models stored as joblib files, metadata recorded in JSON, large files stored in Google Drive remote repository.

## Model Evaluation and Interpretability: Performance and Compliance Support

Model evaluation uses metrics such as F1 score, ROC-AUC, PR curve, and TimeSeriesSplit cross-validation. XGBoost is optimal with an ROC-AUC of 0.9614, F1 score of 0.5829, and threshold of 0.60. SHAP is used to analyze feature importance, ensuring model interpretability and meeting financial compliance audit requirements.

## Current Status and Future Plans: Project Progress and Expansion Directions

Currently, the project is in the first phase, using 100,000 sampled data for experiments; future plans include expanding to the complete dataset, exploring ensemble models and real-time inference architecture. Automated testing and code checks have been established, and experimental results and model performance are versioned and recorded.

## Practical Insights: Key Experiences in Building Financial Fraud Detection Systems

The insights from the project include:
1. Multi-class classification is better than binary classification, supporting precise business decisions;
2. MLOps (DVC, modularization, automated testing) should be established early;
3. Emphasize both interpretability and performance, SHAP tool improves transparency;
4. Prevent data leakage, apply techniques like SMOTE in the correct order. This project provides a full-process reference for production-level systems.
