Zing Forum

Reading

Rossmann Sales Forecasting Practice: How to Optimize Retail Operation Decisions with Machine Learning

A sales forecasting project based on real data from 1115 Rossmann pharmacies in Germany, which achieves a six-week forward forecast using K-Means clustering, gradient boosting, random forests, and neural networks to provide data support for operational decisions.

销售预测零售分析梯度提升K-Means聚类特征工程机器学习运营优化时间序列数据清洗
Published 2026-05-17 21:45Recent activity 2026-05-17 21:56Estimated read 5 min
Rossmann Sales Forecasting Practice: How to Optimize Retail Operation Decisions with Machine Learning
1

Section 01

Rossmann Sales Forecasting Practice: Core Ideas and Value Guide

A sales forecasting project based on real data from 1115 Rossmann pharmacies in Germany. It achieves a six-week forward forecast using K-Means clustering, gradient boosting, random forests, and neural networks, closely integrating technical results with operational decisions to provide data support for inventory management, staff scheduling, and promotion planning.

2

Section 02

Business Background and Challenges

Rossmann is one of the largest chain pharmacy brands in Europe, operating over 3000 stores in Germany. Accurate sales forecasting is the foundation for inventory management, staff scheduling, and promotion planning; prediction deviations can lead to inventory overstock or stockouts. The project focuses on integrating forecasting with operational decisions, with core questions being the factors driving daily sales fluctuations in 1115 stores and the possibility of six-week forward forecasting.

3

Section 03

Data Overview and Cleaning

Using Kaggle competition dataset: training data from 2013 to July 2015 (about 1.01 million records), test data from August to September 2015 (about 41,000 records), including store information and transaction data. Cleaning steps: delete empty columns, fill missing competition distance (median 2325 meters), remove records where the store was open but sales were zero, and unify the format of the StateHoliday field.

4

Section 04

Exploratory Data Analysis Findings

Store characteristics: Type B stores have an average daily sales of 10,060 euros (Type D: 5,738 euros), and Type B product combinations have the highest average transaction value; Time factors: Monday has the highest sales, with a peak in December and a trough in July; Promotion: Same-day promotion increases sales by 81%, while periodic mail promotion has weak effect; Competition: New competitors have a large initial impact when opening; External factors: State holidays have a significant positive impact on specific stores.

5

Section 05

Key Feature Engineering Strategies

Expanded to 25 fields, core features: CompetitionOpen (number of months since competitor opened), LogCompetitionDistance (log transformation of distance), IsPromo2Month (promotion cycle marker), these features significantly improve model performance.

6

Section 06

Clustering and Modeling Methods

K-Means clustering to divide stores into groups; Comparative algorithms: Gradient Boosting (GBM) is the best (RMSPE 22.3% for high-sales groups), Random Forest is robust, Neural Networks perform weakly; Training is divided into fast iteration (40% samples) and full data mode.

7

Section 07

Result Interpretation and Business Value

Generated 41,088 predictions on the test set; an RMSPE of 22.3% is acceptable in the retail field. Predictions support: optimized inventory ordering, staff scheduling adjustments, and promotion strategy evaluation.

8

Section 08

Experience Summary and Insights

Business understanding takes priority over model tuning; feature engineering needs to combine domain knowledge; model evaluation should align with business goals. The project structure is clear and reproducible, providing a complete reference template for data science learners.