Zing Forum

Reading

Football Transfer Market Value Analysis Platform: Using Econometrics and Machine Learning to Identify Undervalued Players

A comprehensive football analysis platform that combines econometrics and machine learning technologies. Through multi-source data integration, XGBoost valuation models, explainable AI, and decision support systems, it helps scouting departments identify undervalued opportunities in the transfer market.

足球分析转会市场机器学习XGBoost体育数据球探系统决策支持系统SHAP可解释AI计量经济学
Published 2026-06-06 08:15Recent activity 2026-06-06 08:20Estimated read 6 min
Football Transfer Market Value Analysis Platform: Using Econometrics and Machine Learning to Identify Undervalued Players
1

Section 01

[Introduction] Core Overview of the Football Transfer Market Value Analysis Platform

This article introduces a football transfer market value analysis platform that combines econometrics and machine learning technologies. It aims to help clubs identify undervalued players (i.e., players whose model valuation is significantly higher than their market valuation) and seize transfer opportunities of "buy low, sell high". The platform has evolved into a complete Decision Support System (DSS), including functional modules such as opportunity detection and risk assessment, providing data-driven decision-making basis for scouting departments.

2

Section 02

Project Background: Pain Points and Needs in the Transfer Market

The football transfer market has problems of information asymmetry and high uncertainty. Traditional scouting methods are limited by subjective bias, incomplete information, and resource constraints, making it difficult to efficiently screen target players. The core goal of this project (a master's graduation project) is to solve: Which players have market values lower than their deserved values based on their athletic characteristics, age, experience, and recent performance? By identifying such market inefficiencies, it helps clubs optimize transfer decisions.

3

Section 03

Technical Methodology: From Data Integration to Model Interpretation

The project adopts the CRISP-DM process, with key steps including:

  1. Multi-source Data Integration: Integrate FBref (player statistical data) and Transfermarkt (market valuation) with a matching rate of 88%. The final dataset contains 2136 players and 3916 records;
  2. Dual-track Modeling: Econometric baseline model (strong interpretability) + XGBoost machine learning model (excellent predictive ability, validation set R²=0.5414);
  3. Explainable AI: Use SHAP values to explain model predictions and enhance user trust;
  4. Multi-criteria Scoring: Develop an "opportunity score" and risk framework to convert model outputs into intuitive indicators.
4

Section 04

Core Functional Modules: Detailed Explanation of the Decision Support System

The platform has been upgraded to a decision support system, including the following modules:

  • Opportunity Detection: Automatically scan the market to identify players whose model valuation is higher than market valuation (considering factors such as age, position, league, etc.);
  • Risk Assessment: Provide risk scores for recommended players (including dimensions such as injury history, performance stability, remaining contract years, etc.);
  • Recruitment Intelligence: Provide in-depth player profiles (technical characteristics, tactical adaptability, etc.);
  • Candidate Comparison: Support comparison of key indicators among multiple players;
  • Recruitment Dashboard: An interactive dashboard that visualizes candidate lists, budget allocation, and other information.
5

Section 05

Key Outcome Indicators: System Effectiveness Verification

The project has achieved significant results, with core indicators as follows:

Indicator Value
Matching rate between FBref and Transfermarkt 88%
Number of analyzed players 2136
Modelable observation records 3916
XGBoost model R² 0.5414
Top10 recommendation accuracy 90%
Among them, the 90% Top10 recommendation accuracy means that 9 out of the top 10 opportunities recommended by the system are real undervalued market opportunities, verifying the practical value of the system.
6

Section 06

Limitations and Future Improvement Directions

The current version has the following limitations and improvement spaces:

  • Data Coverage: Mainly covers major European leagues; needs to expand to small leagues and non-European markets;
  • Model Optimization: Can try integrated models, deep learning, and other methods to improve predictive ability;
  • Real-time Performance: Needs to integrate real-time data streams to enhance system timeliness;
  • External Factors: Needs to include non-statistical factors such as injuries, locker room dynamics, and coach tactical preferences.