Zing Forum

Reading

Saudi Car Price Prediction MLOps Project: Complete Practice from Data Crawling to Cloud Deployment

An end-to-end MLOps project for the Saudi Arabian car market, demonstrating how to build automated data pipelines, intelligent training gating mechanisms, and cloud-native deployment architectures to achieve full lifecycle management of machine learning models.

MLOps机器学习价格预测XGBoostMongoDB自动化部署数据管道沙特市场
Published 2026-05-22 21:46Recent activity 2026-05-22 21:49Estimated read 5 min
Saudi Car Price Prediction MLOps Project: Complete Practice from Data Crawling to Cloud Deployment
1

Section 01

Saudi Car Price Prediction MLOps Project: Full Practice Overview

This open-source project Saudi-Car-Price-MLOps demonstrates an end-to-end MLOps solution for Saudi Arabia's car price prediction. It covers automated data crawling, smart training triggering, cloud model registration, deployment monitoring, and full lifecycle management of machine learning models, addressing the challenge of turning lab models into stable production systems.

2

Section 02

Project Background & Core Objectives

Saudi Arabia is one of the largest car markets in the Middle East with active new/used car transactions, but prices are volatile due to multiple factors (brand, year, mileage, configuration). Traditional manual pricing is inefficient and hard to scale. The project aims to build an intelligent pricing system with automated data collection, model retraining, version management, and deployment—embodying the MLOps paradigm of data-driven + automated operations.

3

Section 03

Data Layer: Asynchronous Crawling & Smart Storage

The data infrastructure uses hybrid storage: SQLite for local development (fast iteration) and MongoDB Atlas for production (scalability). Data is crawled asynchronously every 3 days using Playwright and BeautifulSoup, following 'polite crawling' (random delays, async requests). MongoDB acts as the control center: pipeline_config stores metadata to trigger hyperparameter optimization, and all prediction requests/metadata are logged for audit tracking.

4

Section 04

Training Pipeline: Smart Gating & Dynamic Tuning

Instead of fixed-time retraining, the project uses a 'smart training gate'—triggering retraining only when 500 new unique records are added (saving resources while ensuring timeliness). When data grows over 50%, Optuna is used for Bayesian hyperparameter tuning. The core model is XGBoost, and MLCarsProjectNotebook.ipynb details EDA (Arabic term handling, key price factors, baseline validation).

5

Section 05

Version Management: Forward-Compatible Strategy

Version management handles evolution gracefully:

  • v1: Local preprocessor.pkl (legacy fallback since early preprocessors weren't cloud-registered).
  • v2+: Model + matching preprocessor as a package uploaded to DagsHub (atomic versions).
  • CI/CD: GitHub Actions first tries DagsHub (v2+), then falls back to local (v1) for zero downtime during transitions.
6

Section 06

Deployment Architecture: Containerization & Monitoring

The system is containerized with Docker, using GitHub Actions for CI/CD (only deploy after automated tests pass via Render). It offers three interfaces:

  • Gradio: Real-time price query for end users.
  • Streamlit: Dashboard for market trends, data distribution, and logs (for operators).
  • FastAPI: Standard API for third-party integration.
7

Section 07

Limitations, Future Directions & Key Insights

Limitations: Better used car prediction (more data than new cars). Future: Improve new car prediction with more diverse data. Key Insights:

  1. Prioritize automation (data to deployment).
  2. Use data thresholds instead of fixed schedules.
  3. Forward-compatible versioning for smooth upgrades.
  4. Full observability (data distribution to logs).
  5. Seamless local-cloud switch (dev efficiency + production stability). Note: For educational/research purposes only—don't use predictions as sole decision basis.