# Spotify Song Popularity Prediction: A Machine Learning Practice Based on Audio Features

> A complete project using Python to analyze Spotify song data and build machine learning models for popularity prediction. Through exploratory data analysis and comparison of multiple regression algorithms, it reveals the key factors influencing song popularity.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-20T13:15:47.000Z
- 最近活动: 2026-05-20T13:20:35.439Z
- 热度: 154.9
- 关键词: Spotify, 机器学习, 流行度预测, 音乐推荐, 随机森林, 回归分析, EDA, 音频特征, Python, 数据科学
- 页面链接: https://www.zingnex.cn/en/forum/thread/spotify
- Canonical: https://www.zingnex.cn/forum/thread/spotify
- Markdown 来源: floors_fallback

---

## Introduction to the Spotify Song Popularity Prediction Project

This project is based on Spotify song data, using Python for exploratory data analysis (EDA) and machine learning modeling to predict song popularity and reveal influencing factors. Core methods include comparison of multiple regression algorithms (linear regression, decision tree, random forest, gradient boosting), with the random forest model ultimately performing the best. The project results can provide data support for music production and event planning.

## Project Background and Dataset Overview

### Project Background
In the era of music streaming, understanding the factors of song popularity is crucial for producers (to create competitive works) and event planners (to enhance audience engagement). The goal of this project is to analyze Spotify data, explore factors influencing popularity, and build a prediction model.
### Dataset
We use the Spotify Tracks Dataset from Kaggle (approximately 114,000 records, 20 fields), which includes core audio features (such as popularity, danceability, energy, etc.) and metadata (artist, genre, duration_ms, etc.). Its characteristic is that popularity is influenced by a combination of multiple features.

## Project Methods and Workflow

### Team Division
Team GROUP6 has clear division of labor: Data engineers are responsible for the cleaning process; data quality analysts handle quality checks; EDA analysts develop exploratory analysis notebooks; visualization analysts create charts; all members participate in the modeling phase.
### Key Workflow
1. **EDA**: Explore data structure, feature distribution, and key relationships (e.g., loudness and popularity, genre popularity, etc.).
2. **Data Preprocessing**: Column deletion, missing value/duplicate value handling, Track ID deduplication, IQR outlier handling, and feature standardization.
3. **Modeling**: Test 4 regression algorithms (linear regression, decision tree, random forest, gradient boosting), evaluate using MAE/MSE/RMSE/R², and perform hyperparameter tuning.

## Analysis Results and Model Performance

### EDA Findings
- Songs with higher loudness have better popularity; songs with explicit content have slightly higher average popularity; pop-film, k-pop, and chill genres have prominent popularity; the star effect is significant.
### Model Performance
Random forest regression achieved the best performance, effectively capturing the nonlinear relationships between features. Key audio features influencing popularity were identified through permutation importance analysis.

## Key Findings

Key findings of the project:
1. Songs with high energy and high loudness are more likely to be popular;
2. Genres like pop-film, k-pop, and chill have higher average popularity;
3. Songs with explicit content have slightly higher popularity (related to specific genres);
4. Popularity is determined by a combination of multiple features, with no single decisive factor;
5. The star effect remains important in music consumption.

## Practical Application Recommendations

### For Music Producers
Refer to the features of high-popularity songs: higher energy, loudness, and dynamic rhythm; prioritize genres like pop, k-pop, or dance-pop to meet audience preferences.
### For Event Planners
Choose songs with high energy, strong rhythm, or from popular genres to enhance the on-site atmosphere and audience engagement.

## Technical Highlights and Conclusion

### Technical Highlights
- Complete MLOps workflow: end-to-end workflow from data collection to model evaluation;
- Team collaboration: clear division of labor + all members participate in modeling to ensure breadth and quality;
- Multiple model comparison and interpretability analysis: focus on business insights rather than just accuracy.
### Conclusion
This project demonstrates the application value of machine learning in the music industry, covering the entire lifecycle of data science and providing a reference for learners in related fields. As AI penetrates deeper into the creative industry, such projects will become more valuable.
