Reading

XGBoost-Based Online Payment Fraud Detection System: From Data Imbalance to Production Deployment

This article provides an in-depth analysis of an end-to-end payment fraud detection project, exploring how to handle highly imbalanced financial data, optimize recall rate, and implement model production deployment via Streamlit.

欺诈检测XGBoost类别不平衡SMOTE金融风控Streamlit机器学习召回率优化

Published 2026-05-02 02:15Recent activity 2026-05-02 02:19Estimated read 7 min

XGBoost-Based Online Payment Fraud Detection System: From Data Imbalance to Production Deployment

Section 01

Introduction to XGBoost-Based Online Payment Fraud Detection System

This article introduces an end-to-end online payment fraud detection project. It corely uses the XGBoost algorithm, adopts SMOTE oversampling and scale_pos_weight parameter tuning to address the problem of highly imbalanced financial data, optimizes recall rate, and implements production deployment via Streamlit. It forms a complete closed loop from data processing to model application, providing a practical solution for financial risk control.

Section 02

Practical Challenges in Financial Fraud Detection

With the popularity of digital payments, fraud detection is a core risk control capability for financial institutions, but it faces the problem of class imbalance (fraudulent transactions often account for less than 1%). If only overall accuracy is pursued, the model tends to predict all transactions as normal, losing practical value. Therefore, the project focuses on recall rate, prioritizing ensuring no fraudulent transactions are missed, even if a higher false positive rate is accepted.

Section 03

Data Preprocessing and Feature Engineering Strategies

To address data imbalance, the project adopts a dual strategy:

SMOTE Oversampling: Generate synthetic fraud samples to balance training data, avoiding overfitting caused by simple duplication;
scale_pos_weight Parameter: Adjust the weight of positive and negative samples in XGBoost without modifying the original data, which is flexible and efficient. In terms of feature engineering, it is designed based on business logic: account balance changes, transaction amount deviation from history, transaction frequency time distribution, payee behavior patterns, etc. Key insight: Fraud detection relies on behavior patterns rather than simple transaction amounts (small and frequent transactions may be more risky).

Section 04

Model Selection and Optimization

The project compares three algorithms: logistic regression (baseline, strong interpretability), random forest (stable integration, conducive to feature analysis), and XGBoost (final choice, gradient boosting framework that supports custom loss and weight adjustment). Optimization strategies include:

Threshold Tuning: Analyze the precision-recall curve to select the optimal threshold, tending to lower the threshold to improve recall rate;
Evaluation Metrics: Focus on recall rate (proportion of fraudulent transactions identified), precision (proportion of true fraud among predicted fraud), and F1 score (comprehensive indicator).

Section 05

Model Performance Analysis

The final XGBoost model's performance on the test set:

Metric	Value	Interpretation
Recall Rate	~0.68	Identifies 68% of fraudulent transactions
Precision	~0.24	24% of predicted fraud are true fraud
F1 Score	~0.35	Comprehensive performance
Business value: Intercepts most fraudulent transactions; although the false positive rate is high, the cost of manual review is lower than the loss from fraud, so it can be used as the first layer of screening for manual review.

Section 06

Production Deployment Practice

The project builds a web application via Streamlit, with features including real-time prediction of single transactions, fraud probability visualization, and batch data upload prediction. Streamlit advantages: Pure Python development, no front-end experience required, high efficiency. Deployment solutions: Already launched on Streamlit Community Cloud (suitable for prototype demonstration and lightweight production); for enterprise-level, consider Docker containerization, Kubernetes cluster, or integration with existing risk control systems.

Section 07

Key Experiences and Improvement Directions

Experience Summary: 1. Business understanding takes priority (defining optimization goals and balancing automation and manual review are more important than parameter tuning); 2. Imbalanced data requires flexible combination of strategies such as SMOTE, weight adjustment, and threshold tuning; 3. Need to complete production links such as model persistence, feature pipeline encapsulation, and web deployment. Improvement Directions: Improve recall rate (ensemble learning/deep learning), strengthen real-time feature engineering, increase model interpretability (SHAP/LIME), and support online learning to adapt to the evolution of fraud patterns.

XGBoost-Based Online Payment Fraud Detection System: From Data Imbalance to Production Deployment

Introduction to XGBoost-Based Online Payment Fraud Detection System

Practical Challenges in Financial Fraud Detection

Data Preprocessing and Feature Engineering Strategies

Model Selection and Optimization

Model Performance Analysis

Production Deployment Practice

Key Experiences and Improvement Directions

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

Graph Neural Networks Revolutionize Global Weather Forecasting: From Graph Weather to Open-Source Practice of Multi-Model Fusion

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Vertica Expert Skills: A One-Stop Guide to Enterprise Database Migration and Optimization