# Price Prediction for the German Used Car Market: A Machine Learning Analysis Based on 46,000 Real Data Points

> This article introduces a data analysis project based on over 46,000 used car listings from Germany's AutoScout24 platform. Through data cleaning, exploratory analysis, and machine learning model construction, it reveals key factors affecting used car prices and compares the prediction performance of three models: linear regression, random forest, and gradient boosting.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-28T11:15:41.000Z
- 最近活动: 2026-05-28T11:20:39.443Z
- 热度: 154.9
- 关键词: 机器学习, 二手车价格预测, 随机森林, 德国汽车市场, 数据科学, Python, Scikit-learn, AutoScout24, 回归分析, 数据可视化
- 页面链接: https://www.zingnex.cn/en/forum/thread/46000
- Canonical: https://www.zingnex.cn/forum/thread/46000
- Markdown 来源: floors_fallback

---

## Introduction to the German Used Car Market Price Prediction Project

This project is based on over 46,000 used car data points from Germany's AutoScout24 platform. Through data cleaning, exploratory analysis, and machine learning modeling, it reveals key factors affecting used car prices and compares the prediction performance of three models: linear regression, random forest, and gradient boosting. Key findings include: Horsepower is the strongest predictor of price; the random forest model performs best in terms of accuracy and efficiency. The project source is the autoscout24-analysis project on GitHub (by Andrii Semenov), and a Tableau interactive dashboard is provided.

## Project Background and Research Motivation

As one of Europe's largest car markets, Germany has active used car transactions, but manual price evaluation struggles to fully capture the interactions between multiple factors such as brand, vehicle age, mileage, and horsepower. This project aims to answer the following questions using data science techniques:
- Which brands are most popular in the German market?
- What is the relationship between price, mileage, and horsepower?
- What is the market distribution of different fuel types and transmissions?
- Can prices be accurately predicted based on vehicle features?

## Dataset Overview and Preprocessing

The dataset comes from used car listings on the AutoScout24 platform from 2011 to 2021, originally containing 46,405 records with fields including brand, model, fuel type, transmission type, mileage, horsepower, year, price, etc. Preprocessing steps:
1. Handle missing values: Records with missing model (143), transmission (182), or horsepower (29) were processed, leaving 46,071 valid records.
2. Filter brands: The top five brands with the highest market share (Volkswagen, Opel, Ford, Skoda, Renault) were selected, and finally 21,772 records were used for model training.

## Exploratory Data Analysis Results

Analysis of market structure and price drivers:
- **Brand Distribution**: Volkswagen dominates, followed by European brands like Opel and Ford.
- **Price Correlations**: 
  - Horsepower: Positively correlated with price (+0.75), the strongest predictor.
  - Vehicle Age: Positively correlated with price (+0.41), newer cars have higher prices.
  - Mileage: Negatively correlated with price (-0.30), higher mileage leads to lower prices.
- **Market Preferences**: 
  - Fuel Type: Gasoline cars are the main type.
  - Transmission: Automatic transmissions have a higher average price than manual ones.

## Machine Learning Model Construction and Performance Comparison

Three regression models were built and compared:
| Model | Mean Absolute Error (MAE) | R² Score |
|------|-------------------|---------|
| Linear Regression | 2,704 EUR | 0.80 |
| Random Forest | 1,615 EUR | 0.91 |
| Gradient Boosting | ~1,643 EUR | 0.91 |

Analysis:
- Linear regression serves as the baseline model, explaining 80% of price variation, but has a high MAE (2,704 EUR).
- Random forest and gradient boosting perform better with an R² of 0.91, capable of capturing non-linear feature interactions; random forest has a slightly lower MAE and faster inference speed, making it the final choice.

## Practical Significance and Application Scenarios

The project results have practical value for multiple parties:
- **Buyers**: Provide objective pricing references to avoid overpaying due to information asymmetry.
- **Sellers**: Understand key factors to optimize sales strategies (e.g., emphasizing the advantage of high horsepower).
- **Financial and Insurance Industries**: Serve as a basis for risk assessment and product pricing.
- **Learners**: An end-to-end machine learning project example covering the full process of data processing, EDA, modeling, and visualization.

## Limitations and Summary Insights

**Limitations**:
- Only covers the top five brands, with limited prediction ability for niche/luxury brands.
- Does not include important factors such as vehicle configuration, accident history, and maintenance records.
- Does not consider external factors like macroeconomics and fuel prices.

**Summary**:
This project verifies the effectiveness of machine learning in used car price prediction. The key lies in solid data preprocessing, in-depth exploratory analysis, and scientific model selection. These principles can be transferred to similar scenarios such as real estate valuation and equipment residual value assessment.
