Zing 论坛

正文

沙特汽车价格预测MLOps项目:从数据抓取到云端部署的完整实践

一个面向沙特阿拉伯汽车市场的端到端MLOps项目,展示了如何构建自动化数据管道、智能训练门控机制和云原生部署架构,实现机器学习模型的全生命周期管理。

MLOps机器学习价格预测XGBoostMongoDB自动化部署数据管道沙特市场
发布时间 2026/05/22 21:46最近活动 2026/05/22 21:49预计阅读 5 分钟
沙特汽车价格预测MLOps项目:从数据抓取到云端部署的完整实践
1

章节 01

Saudi Car Price Prediction MLOps Project: Full Practice Overview

This open-source project Saudi-Car-Price-MLOps demonstrates an end-to-end MLOps solution for Saudi Arabia's car price prediction. It covers automated data crawling, smart training triggering, cloud model registration, deployment monitoring, and full lifecycle management of machine learning models, addressing the challenge of turning lab models into stable production systems.

2

章节 02

Project Background & Core Objectives

Saudi Arabia is one of the largest car markets in the Middle East with active new/used car transactions, but prices are volatile due to multiple factors (brand, year, mileage, configuration). Traditional manual pricing is inefficient and hard to scale. The project aims to build an intelligent pricing system with automated data collection, model retraining, version management, and deployment—embodying the MLOps paradigm of data-driven + automated operations.

3

章节 03

Data Layer: Asynchronous Crawling & Smart Storage

The data infrastructure uses hybrid storage: SQLite for local development (fast iteration) and MongoDB Atlas for production (scalability). Data is crawled asynchronously every 3 days using Playwright and BeautifulSoup, following 'polite crawling' (random delays, async requests). MongoDB acts as the control center: pipeline_config stores metadata to trigger hyperparameter optimization, and all prediction requests/metadata are logged for audit tracking.

4

章节 04

Training Pipeline: Smart Gating & Dynamic Tuning

Instead of fixed-time retraining, the project uses a 'smart training gate'—triggering retraining only when 500 new unique records are added (saving resources while ensuring timeliness). When data grows over 50%, Optuna is used for Bayesian hyperparameter tuning. The core model is XGBoost, and MLCarsProjectNotebook.ipynb details EDA (Arabic term handling, key price factors, baseline validation).

5

章节 05

Version Management: Forward-Compatible Strategy

Version management handles evolution gracefully:

  • v1: Local preprocessor.pkl (legacy fallback since early preprocessors weren't cloud-registered).
  • v2+: Model + matching preprocessor as a package uploaded to DagsHub (atomic versions).
  • CI/CD: GitHub Actions first tries DagsHub (v2+), then falls back to local (v1) for zero downtime during transitions.
6

章节 06

Deployment Architecture: Containerization & Monitoring

The system is containerized with Docker, using GitHub Actions for CI/CD (only deploy after automated tests pass via Render). It offers three interfaces:

  • Gradio: Real-time price query for end users.
  • Streamlit: Dashboard for market trends, data distribution, and logs (for operators).
  • FastAPI: Standard API for third-party integration.
7

章节 07

Limitations, Future Directions & Key Insights

Limitations: Better used car prediction (more data than new cars). Future: Improve new car prediction with more diverse data. Key Insights:

  1. Prioritize automation (data to deployment).
  2. Use data thresholds instead of fixed schedules.
  3. Forward-compatible versioning for smooth upgrades.
  4. Full observability (data distribution to logs).
  5. Seamless local-cloud switch (dev efficiency + production stability). Note: For educational/research purposes only—don't use predictions as sole decision basis.