# Predicting Bank Customer Churn Using Deep Neural Networks: A Complete Practice from Data Preprocessing to Model Optimization

> This article introduces a bank customer churn prediction project based on feedforward neural networks, covering data exploration, preprocessing, comparison of six model architectures, and SMOTE optimization strategy for class imbalance issues, ultimately achieving a 74% recall rate for churned customers.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-02T02:44:25.000Z
- 最近活动: 2026-06-02T02:51:47.236Z
- 热度: 152.9
- 关键词: 客户流失预测, 深度神经网络, 类别不平衡, SMOTE, TensorFlow, Keras, 机器学习工程, 银行业务, 召回率优化
- 页面链接: https://www.zingnex.cn/en/forum/thread/geo-github-thehotpath-bank-churn-neural-network
- Canonical: https://www.zingnex.cn/forum/thread/geo-github-thehotpath-bank-churn-neural-network
- Markdown 来源: floors_fallback

---

## [Introduction] A Complete Practice of Predicting Bank Customer Churn Using Deep Neural Networks

This project is a practice of predicting bank customer churn based on feedforward neural networks, covering data exploration, preprocessing, comparison of six model architectures, and SMOTE optimization strategy for class imbalance, ultimately achieving a 74% recall rate for churned customers. The project comes from a GitHub repository (maintained by thehotpath) and is a complete machine learning engineering case, which has practical reference value for class imbalance issues in business scenarios.

## Project Background and Business Value

Customer churn is a core challenge in the banking industry, as the cost of acquiring new customers is much higher than maintaining existing ones. This project builds a model based on a dataset of 10,000 bank customers. Since only about 20% of customers in the data are churned (class imbalance), recall rate is set as the primary optimization target—missing a churned customer (false negative) costs more than misclassifying a loyal customer as churned (false positive).

## Dataset and Feature Engineering

The dataset contains 14 fields for 10k customers: numerical features (credit score, age, tenure, account balance, estimated income), categorical features (geographic location, gender), product-related features (number of products, credit card ownership, active membership status), and the target variable Exited (1 = churned). Preprocessing steps: remove identifiers like RowNumber, one-hot encode categorical variables, and standardize numerical features. EDA shows no obvious linear correlation between features, so dimensionality reduction is not needed.

## Model Architectures and Experimental Design

Six neural network configurations are compared: 1. Basic SGD network; 2. Adam-optimized network; 3. Adam + Dropout network; 4. SMOTE + SGD network;5. SMOTE + Adam network;6. SMOTE + Adam + Dropout network. SMOTE generates synthetic samples for the minority class via interpolation to alleviate class imbalance bias. The progressive experimental design allows clear observation of each component's contribution to performance.

## Key Finding: The Critical Role of SMOTE in Improving Recall Rate

SMOTE data balancing is the decisive factor in improving recall rate. The final SMOTE + Adam + Dropout model achieves an approximate 74% recall rate on the test set (identifying 301 out of 407 churned customers). Models without SMOTE have insufficient recall rates (missing more churned customers). Although the SMOTE model has more false positives (476 cases), it is acceptable in business terms (the cost of retaining loyal customers is lower than the loss of losing churned customers). The experimental process is documented with 45 visual charts (EDA, training curves, confusion matrices).

## Business Application Recommendations

Practical strategies based on model insights:
1. Precision marketing: Push personalized offers and loyalty programs to high-risk customers;
2. Lifecycle management: Provide financial planning for older customers and strengthen onboarding for new customers;
3. Account activation: Incentivize zero-balance/inactive accounts to re-engage;
4. Product cross-selling: Promote additional products to enhance customer stickiness;
5. Regional strategy: Analyze the causes of high churn in regions and develop localized solutions.

## Technical Implementation and Reproducibility

The project is built with Python 3.10+ and TensorFlow/Keras, with a clear code structure (data pipeline, model definition, training scripts, evaluation metrics). Dependencies are listed in requirements.txt, the dataset is in the data/ directory, and results are reproducible. The complete analysis is in notebook/Bustos_INN_Learner_Notebook_Full_code.html, which can be viewed via a browser or nbviewer. The project uses the MIT license and can be freely used for learning and commercial purposes.
