Reading

Football Match Prediction System Based on LightGBM and ELO Rating

A Streamlit application for predicting football match results using LightGBM, Multi-Layer Perceptron (MLP), and the ELO rating system.

足球预测机器学习LightGBMELO评分Streamlit体育数据科学神经网络

Published 2026-05-28 03:15Recent activity 2026-05-28 03:23Estimated read 6 min

Section 01

Introduction: Football Match Prediction System Based on LightGBM and ELO Rating

This project was developed and open-sourced on GitHub by edu-moraess (link: https://github.com/edu-moraess/Football_PredictorML, release date: 2026-05-27). Its core is a football match result prediction system built by combining LightGBM gradient boosting trees, Multi-Layer Perceptron (MLP) neural networks, and the ELO rating system, with an interactive application implemented via the Streamlit framework. This hybrid approach retains the domain knowledge interpretability of ELO ratings while leveraging the data-driven predictive power of machine learning.

Section 02

Background: Intersection of Data Science and Sports Competition

Sports prediction is a popular application area of data science. From setting betting odds to team tactical analysis, machine learning is reshaping the understanding of sports matches. This project combines the classic ELO rating system with modern machine learning algorithms to build a football match prediction system, balancing domain knowledge interpretability and data-driven predictive capabilities.

Section 03

Core Methods: Dual-Model Architecture and ELO Rating System

Dual-Model Fusion Strategy

Adopts a dual-model architecture of LightGBM (excellent at handling structured data and automatically discovering feature interactions) and MLP (captures non-linear deep patterns). The prediction results are output as probabilities of home win/draw/away win via weighted average or stacking fusion.

Application of ELO Rating System

Expected Win Rate Calculation: Based on the rating gap between two teams, formula: E_A = 1/(1+10^((R_B-R_A)/400))
Dynamic Adjustment: After a match, adjust ratings based on the difference between actual and expected results; underdogs get more points for upsets
Home Advantage: Add a fixed ELO bonus (about 100 points) for the home team

Feature Engineering

Construct a rich feature set: ELO-related (current rating, gap, recent changes), historical matches (recent N encounters, goals scored), recent form (win rate in last 5/10 matches, average goals scored/conceded per game), league rankings (current ranking, point gap). Features are standardized.

Section 04

Technical Implementation: Streamlit Deployment and Model Evaluation

Advantages of Streamlit Framework

Pure Python development, no front-end knowledge required
Instant reloading accelerates development and debugging
Rich component library and convenient deployment (Streamlit Cloud or Docker)

Model Training and Evaluation

Time Series Cross-Validation: Split training/test sets by time to avoid future data leakage
Class Imbalance Handling: May use class weight adjustment or sampling strategies
Evaluation Metrics: Log loss (probability calibration), ROC-AUC (discrimination ability), confusion matrix (class performance)

Section 05

Application Scenarios and System Limitations

Application Scenarios

Betting decision assistance (compliance required)
Fantasy sports lineup optimization
Media analysis data support
Team tactical strategy evaluation

Limitations

Random Factors: Red cards, penalties, etc., are difficult to quantify
Data Quality: Missing data for lower-tier leagues
Concept Drift: Changes in team lineup/tactics invalidate historical patterns; regular retraining is needed

Section 06

Conclusion: Project Value and Recommendations for Getting Started in Sports Data Science

This project demonstrates the practical value of combining classic statistical methods (ELO) with modern machine learning, and Streamlit lowers the threshold from experiment to application. It provides a complete reference for beginners from feature engineering to deployment, emphasizing that prediction models need to balance accuracy, interpretability, and practicality.