Zing Forum

Reading

Hybrid Machine Learning Model for Football Match Prediction: A Practical Fusion of LSTM, MLP, and XGBoost

This article introduces a football match prediction system based on a hybrid machine learning model that combines three algorithms—LSTM, MLP, and XGBoost. It enables daily automatic updates through GitHub Actions and includes a web interface to showcase prediction results.

机器学习足球预测LSTMXGBoost混合模型GitHub Actions数据流水线体育数据分析
Published 2026-05-31 10:45Recent activity 2026-05-31 10:48Estimated read 8 min
Hybrid Machine Learning Model for Football Match Prediction: A Practical Fusion of LSTM, MLP, and XGBoost
1

Section 01

Introduction: Overview of the Hybrid ML Model for Football Match Prediction Project

The PredictingFootballMatchesWorkflow project introduced in this article builds a hybrid machine learning model by combining three algorithms—LSTM, MLP, and XGBoost—to predict football match outcomes. The project automatically updates data and prediction results daily via GitHub Actions and features a web interface for display. Originating from a graduation project at the University of York (UK), it received a first-class honors grade and has since been expanded into a complete prediction workflow system.

2

Section 02

Project Background and Origin

Football match prediction is a popular topic in data science. Traditional single models struggle to capture complex dynamic changes. This project started as a graduation project for a degree program at the University of York (UK), achieving a 76% first-class honors score. Later, it was expanded with features like automated data updates and a web display interface, becoming a complete prediction workflow system.

3

Section 03

Project Architecture and Technology Selection

Hybrid Model Design

  • LSTM: Processes time-series data, analyzes metrics like shot counts and goals from the past 10 matches to capture dynamic team trends.
  • MLP: Handles static features such as ELO ratings (quantifying team strength) and a binary indicator for promoted teams, extracting high-level representations.
  • XGBoost: Analyzes performance patterns across different time windows (latest 5/10 matches, home/away games) to provide additional insights.

Data Pipeline

  • Update Dataset.py: Scrapes the latest match results from football-data.co.uk and incrementally updates the dataset.
  • Fixture Scrape.py: Retrieves fixtures for Europe's top five leagues (Premier League, La Liga, Serie A, Ligue 1, Bundesliga) via the football-data.org API.
  • StandardiseFixtures.py: Unifies column names and team name formats across different data sources.
  • Testing.py: Loads pre-trained models, generates win/draw/loss probability predictions, and updates results and time records.
4

Section 04

Model Training and Prediction Mechanism

Training Data and Feature Engineering

Uses historical match data from multiple seasons, with features including:

  • Match statistics features (shot count, goals scored, possession rate)
  • Team strength features (ELO rating)
  • Time-series features (performance trends from the past 10 matches)
  • Contextual features (home/away status, promoted team indicator)

Prediction Output

Outputs three probability values for each match: home team win, draw, and away team win, providing more reference-worthy results.

5

Section 05

Web Interface and Automated Deployment

Frontend Display

The football-predictor-ui folder contains complete frontend code (HTML/CSS/JS) that displays prediction probabilities for upcoming matches and data update times.

GitHub Actions Automation

Automatically executes data scraping, model prediction, and website update processes daily without manual intervention, ensuring predictions are based on the latest data.

6

Section 06

Technical Highlights and Practical Value

Advantages of Multi-Model Fusion

Each single model has its limitations: LSTM excels at time-series data but ignores static features; XGBoost is strong in feature engineering but cannot handle sequence data; MLP has a simple structure but limited ability to capture complex patterns. Fusing the three models achieves complementary advantages and improves prediction robustness.

Engineering Practice Value

From prototype to product system, the complete data pipeline, automated updates, and user-friendly web interface make it a practical prediction tool.

Scalability

The modular architecture facilitates integration of new data sources and addition of new model components, making it suitable for building production-level systems.

7

Section 07

Summary and Future Outlook

This project demonstrates the innovative application of machine learning in sports prediction. The hybrid model architecture, complete pipeline, and automated deployment provide references for similar scenarios. For developers, it is an excellent case study for learning machine learning engineering practices, and its modular design supports secondary development.

Future expandable features:

  • Support more leagues
  • Integrate real-time odds data
  • Develop a mobile interface
  • Explore the application of advanced deep learning architectures like Transformers.