Zing Forum

Reading

Using K-Means Clustering Algorithm for Mall Customer Segmentation: From Elbow Method to Visualization Practice

This article provides an in-depth analysis of a complete machine learning practical project, demonstrating how to use the K-Means clustering algorithm for mall customer segmentation. It determines the optimal number of clusters using the elbow method and visually displays the feature distribution of five customer groups, helping to understand the practical application of unsupervised learning in business analysis.

K-Means聚类客户分群机器学习肘部法则无监督学习Pythonscikit-learn数据可视化
Published 2026-06-09 15:45Recent activity 2026-06-09 15:50Estimated read 7 min
Using K-Means Clustering Algorithm for Mall Customer Segmentation: From Elbow Method to Visualization Practice
1

Section 01

【Introduction】K-Means Clustering for Mall Customer Segmentation: From Elbow Method to Visualization Practice

This article presents a complete machine learning practical project that uses the K-Means clustering algorithm to segment mall customers. The core process includes: using the elbow method to determine the optimal number of clusters K=5, visually displaying the feature distribution of five customer groups, and helping to understand the practical application of unsupervised learning in business analysis. This project is derived from the SkillCraft Machine Learning Internship Task, and the original project was published by srethulak on GitHub (link: https://github.com/srethulak/SkillCraft-ML-Task02-Mall-Customer-Segmentation).

2

Section 02

Project Background and Significance: The Value of Unsupervised Learning in Customer Segmentation

Customer segmentation is a classic application scenario of unsupervised learning, which can discover hidden patterns in data without labeled data. For retail businesses, understanding the characteristics of different customer groups is key to formulating precise marketing strategies. This project fully demonstrates the entire process from data loading, feature selection, model training to result visualization, and uses K-Means to segment mall customers by annual income and spending score, providing support for differentiated marketing.

3

Section 03

Dataset Selection: Focusing on Two Core Dimensions—Annual Income and Spending Score

The project uses the classic Mall Customer Dataset and selects two key features:

  • Annual Income: in thousands of US dollars, reflecting purchasing power;
  • Spending Score: 1-100 points, reflecting consumption willingness. The combination of these two can effectively distinguish customer groups (ability to buy vs. willingness to buy).
4

Section 04

Method: Elbow Method to Determine Optimal Number of Clusters K

K-Means requires pre-specifying the K value, and the elbow method finds the optimal K through WCSS (Within-Cluster Sum of Squares):

  1. Principle: WCSS decreases as K increases, and the inflection point indicates the optimal K;
  2. Implementation: Iterate K from 1 to 10, initialize with k-means++, and record WCSS;
  3. Conclusion: When K=5, the curve shows an obvious elbow, which is the optimal number of clusters.
5

Section 05

Clustering Results: Feature Analysis of 5 Customer Groups

The clustering results when K=5 are divided into 5 groups:

  • Low Income & Low Spending Group: Low income and low consumption willingness;
  • Low Income & High Spending Group: Not high income but strong consumption willingness;
  • Medium Income & Medium Spending Group: Medium income and medium consumption;
  • High Income & Low Spending Group: High income but conservative consumption;
  • High Income & High Spending Group: High income and active consumption (core value customers). Differentiated marketing strategies can be formulated for different groups, such as pushing high-end promotions to the High Income & Low Spending Group.
6

Section 06

Key Technical Implementation Points: Full Process from Preprocessing to Visualization

The technical process includes:

  1. Data Preprocessing: Load data with pandas and select the annual income and spending score columns;
  2. Model Training: Use scikit-learn's KMeans with parameters n_clusters=5, init='k-means++', random_state=42;
  3. Visualization: Use matplotlib to draw scatter plots, distinguish clusters with different colors, and mark cluster centers.
7

Section 07

Practical Insights: Feature Selection, Business Interpretation, and Algorithm Limitations

Insights from the project:

  • Feature Selection: More dimensions (such as age, gender) are needed in real scenarios;
  • Business Interpretation: Clustering results need to be combined with domain knowledge to transform into strategies;
  • Algorithm Limitations: K-Means assumes spherical clusters and does not perform well on non-convex data or data with large density differences; algorithms like DBSCAN can be tried.
8

Section 08

Summary: Technical Linkage and Closed Loop of Business Value

This project fully demonstrates the application of K-Means in customer segmentation, from K value determination to visualization, making it a good practice project for getting started with unsupervised learning. The value of customer segmentation lies in transforming data insights into business actions—technology provides possibilities, and business creates a closed loop of value.