Zing Forum

Reading

Customer Satisfaction Prediction: Practical Applications of Machine Learning and Deep Learning

Build an application for predicting customer satisfaction, using a combination of machine learning and deep learning models, focusing on handling class imbalance issues, and selecting the optimal model through multi-metric evaluation

客户满意度机器学习深度学习类别不平衡XGBoost客户分析
Published 2026-06-11 16:45Recent activity 2026-06-11 17:06Estimated read 7 min
Customer Satisfaction Prediction: Practical Applications of Machine Learning and Deep Learning
1

Section 01

Introduction / Main Post: Customer Satisfaction Prediction: Practical Applications of Machine Learning and Deep Learning

Build an application for predicting customer satisfaction, using a combination of machine learning and deep learning models, focusing on handling class imbalance issues, and selecting the optimal model through multi-metric evaluation

3

Section 03

Why is Customer Satisfaction So Important?

In a highly competitive business environment, the cost of acquiring new customers is 5-25 times that of retaining existing ones. Customer Satisfaction Score (CSAT) is a core metric for measuring customer experience, directly impacting customer retention, word-of-mouth, and final revenue.

Traditional satisfaction surveys rely on post-event questionnaires, which have a lag. Predictive analytics can identify risks before customers express dissatisfaction, giving businesses the opportunity to intervene proactively. This project demonstrates how to build an end-to-end customer satisfaction prediction system using a combination of machine learning and deep learning technologies.

4

Section 04

Prediction Objectives

Based on customers' historical behavior data, transaction records, and service interaction information, predict customers' satisfaction scores for services (usually 1-5 points or a binary classification of satisfied/dissatisfied).

5

Section 05

Key Challenges

Class Imbalance

  • Satisfied customers are usually far more than dissatisfied ones
  • Extreme scores (1 or 5) may be more common than middle scores
  • Standard models tend to predict the majority class

Feature Complexity

  • Customer data includes numerical features (consumption amount, usage duration) and categorical features (region, product type)
  • Time-series features (trends in purchase frequency changes)
  • Text features (customer service chat records, comments)

Data Quality Issues

  • Missing values (some customers did not fill in certain information)
  • Outliers (large abnormal transactions)
  • Data entry errors
6

Section 06

Data Cleaning

Missing Value Handling

  • Numerical features: fill with median or mean, or predict and fill based on other features
  • Categorical features: fill with mode or create an "Unknown" category
  • Features with high missing rate (>50%): consider deletion or special handling

Outlier Detection and Handling

  • IQR method: identify data points outside 1.5 times the interquartile range
  • Z-score: mark outliers with |z|>3
  • Business rules: e.g., a single transaction exceeding 10 times the customer's historical average

Data Type Conversion

  • Convert date strings to datetime objects
  • Categorical encoding: One-hot or Label encoding
  • Text vectorization: TF-IDF or word embedding
7

Section 07

Feature Engineering

Time Feature Extraction

  • Customer lifecycle: number of days since first purchase
  • Activity: number of days since last purchase (Recency)
  • Frequency: number of purchases in the past 30/90/365 days
  • Amount: average order value, total consumption amount

RFM Model Features

  • Recency: number of days since the customer's last purchase
  • Frequency: number of purchases
  • Monetary: cumulative consumption amount
  • RFM is a classic framework for customer value analysis

Interaction Features

  • Create feature combinations, e.g., "consumption amount × purchase frequency"
  • Capture non-linear relationships

Feature Scaling

  • Standardization (StandardScaler): mean 0, variance 1
  • Normalization (MinMaxScaler): scale to [0,1]
  • Especially important for neural networks
8

Section 08

Resampling Methods

Oversampling

  • Random Oversampling: duplicate minority class samples; simple but prone to overfitting

  • SMOTE (Synthetic Minority Over-sampling Technique): generate new samples by interpolating between minority class samples; alleviates overfitting issues

  • ADASYN (Adaptive Synthetic Sampling): adaptively generate samples, focusing on hard-to-learn samples

Undersampling

  • Random Undersampling: randomly delete majority class samples; may lose important information

  • Tomek Links: delete pairs of samples from different classes that are nearest neighbors to each other; clean class boundaries

  • Edited Nearest Neighbors: delete majority class samples that are misclassified

Hybrid Strategies

  • SMOTE + Tomek Links
  • Oversample first then undersample