Zing Forum

Reading

From Game Data to Production: Practical Experience in Building an End-to-End MLOps Platform

An in-depth analysis of a Dota 2-based machine learning project, demonstrating how to engineer the full workflow of neural network models from training to deployment—including complete practices of data collection, model training, containerization, and Kubernetes deployment.

MLOpsKubernetesDota 2机器学习TerraformGitOpsArgoCD神经网络模型部署DevOps
Published 2026-05-28 00:44Recent activity 2026-05-28 00:48Estimated read 8 min
From Game Data to Production: Practical Experience in Building an End-to-End MLOps Platform
1

Section 01

From Dota2 to Production: A Guide to MLOps Platform Practice

The dota2metalab-infra project introduced in this article uses Dota2 hero draft prediction as a scenario to address common pain points in deploying machine learning models from the lab to production (such as broken data pipelines, chaotic model versions, manual deployment processes, etc.). The project built a complete end-to-end MLOps pipeline, achieving a prediction accuracy of 73%, and completed automated deployment using a cloud-native tech stack (Kubernetes, Terraform, GitOps, etc.), providing a reference example for ML project engineering.

2

Section 02

Project Background and Core Challenges

Background

Machine learning projects often face the dilemma where models perform well in the lab but encounter numerous issues during production deployment. This project uses Dota2 hero draft prediction as a carrier to demonstrate a complete set of MLOps practices, enabling an automated pipeline from data collection to production deployment.

Core Challenges

As a complex competitive game, Dota2 presents the following challenges:

  1. Vast hero combination space (selecting 10 heroes from over 120)
  2. Non-linear effects of synergy and counter between heroes
  3. Adversarial selection affecting strategies
  4. Game version iterations changing hero strength

These factors increase the difficulty of developing and maintaining prediction models.

3

Section 03

Technical Architecture and Implementation Methods

The project adopts a layered MLOps architecture, with each module developed and tested independently:

Data Layer

Collected over 17,000 high-rank match data, including hero selection sequences, players' historical win rates, hero synergy/counter relationships, and match results. Raw data is obtained via Python scripts calling the official API, then cleaned and transformed into model-usable format through feature engineering.

Model Layer

Neural networks are used to capture non-linear interactions between heroes. Input features include hero ID sequences, historical win rates, synergy scores, and team balance metrics. The test set accuracy reaches 73%, which is a respectable result considering the game's uncertainty.

Service Layer

The model is packaged as a containerized REST API service, supporting real-time/batch prediction, version management, and A/B testing.

Infrastructure Layer

Uses a cloud-native tech stack:

  • Terraform: Manages AWS EKS clusters, VPC, and other resources
  • Kubernetes: Container orchestration
  • Helm: K8s package management
  • ArgoCD: GitOps continuous delivery
  • GitHub Actions: CI pipeline
  • Jenkins: CD pipeline
4

Section 04

Highlights of Engineering Practices

GitOps Deployment Mode

All K8s resource configurations are stored in Git repositories. ArgoCD monitors changes and syncs automatically, enabling version tracing, fast rollback, permission control, and audit compliance.

Multi-Environment Management

Development, staging, and production environments are isolated via Terraform workspaces and directory structures. Resources in each environment are independent to avoid interference.

Automated Pipeline

Workflow: Code submission → GitHub Actions runs tests → Build Docker image → Jenkins triggers deployment → ArgoCD syncs configurations → K8s rolling update (zero downtime).

5

Section 05

Reusable Experience and Recommendations

Data Science Team

  • Consider service requirements during model design phase
  • Perform version management for data, models, and code
  • Continuously monitor model performance after deployment to detect drift in time

Engineering Team

  • Use Terraform to implement Infrastructure as Code (IaC) to avoid manual configuration
  • Prioritize GitOps declarative deployment over imperative scripts
  • Conduct automated testing in the CI phase to reduce repair costs

Team Collaboration

  • Break down silos between data scientists and engineers
  • Choose toolchains that the team understands collectively
  • Value documentation (README, Makefile, etc.) as engineering assets
6

Section 06

Project Limitations and Improvement Directions

The project has the following areas for improvement:

  1. Feature engineering can be deepened: Add in-game features such as players' personal styles and recent status
  2. Lack of online learning: The model is trained offline and cannot update automatically based on new data
  3. Insufficient interpretability: The black-box nature of neural networks makes it difficult to explain prediction reasons
7

Section 07

Conclusion

This project uses Dota2 draft prediction as a scenario to demonstrate a complete machine learning engineering workflow. From data collection to Kubernetes deployment, it embodies MLOps best practices. While the 73% prediction accuracy is not the end goal, the automated pipeline paves the way for the implementation of complex AI applications, making it an excellent reference case for ML projects moving from the lab to production.