Zing Forum

Reading

Weather Data-Integrated New York Taxi Smart Analysis System: End-to-End Practice from ETL to Prediction

This article introduces an open-source taxi and weather data analysis platform. It builds an ETL pipeline using Apache Airflow, combines PostgreSQL data warehouse, Power BI visualization, and machine learning prediction to provide a complete solution for urban travel demand analysis.

数据分析ETLApache Airflow机器学习Power BI出租车天气数据PostgreSQL需求预测
Published 2026-05-04 18:15Recent activity 2026-05-04 18:24Estimated read 6 min
Weather Data-Integrated New York Taxi Smart Analysis System: End-to-End Practice from ETL to Prediction
1

Section 01

Guide to End-to-End Practice of Weather Data-Integrated New York Taxi Smart Analysis System

This article introduces the open-source project taxi-weather-analytics, which integrates New York taxi data and weather data to build an end-to-end analysis platform from ETL to prediction. The core tech stack includes Apache Airflow (ETL scheduling), PostgreSQL (data warehouse), Power BI (visualization), and Python machine learning libraries (demand prediction). It aims to solve the problem of ignoring weather factors in urban travel demand prediction and provide decision support for traffic planners and taxi operating companies.

2

Section 02

Project Background: Impact of Weather on Taxi Demand and Limitations of Traditional Analysis

Traditional taxi demand analysis only focuses on historical order data and ignores weather as a key influencing factor. Studies show that taxi demand increases by 30%-50% on rainy days; extreme temperatures reduce walking willingness leading to higher demand; seasonal changes and special weather events also alter travel patterns. Based on this insight, the taxi-weather-analytics project builds an analysis platform integrating both types of data to fill the gaps in traditional methods.

3

Section 03

Technical Architecture and ETL Process Design

The project adopts an end-to-end layered architecture:

  1. Core Components: Apache Airflow (task scheduling/orchestration), PostgreSQL (data storage), Power BI (visualization), Python ML libraries (prediction);
  2. Data Flow: Data collection (taxi + weather data sources) → ETL processing (scheduled cleaning and transformation via Airflow) → Storage (PostgreSQL) → Analysis and display (Power BI) → Prediction (ML model);
  3. ETL Details: Extraction (New York open data + weather API), Transformation (cleaning/standardization/feature engineering/association), Loading (writing to PostgreSQL and query optimization).
4

Section 04

Value of Data Integration and Visual Analysis Scenarios

Value of Data Integration: Combining weather data can explain the reasons for demand changes (e.g., surge in demand on rainy days), but challenges such as time/space alignment, data quality, and real-time performance need to be addressed; Visualization Capabilities: Power BI supports time trends, geographic heatmaps, weather impact comparisons, and operational indicator monitoring; Typical Scenarios: Operational optimization (peak scheduling), pricing strategy (dynamic pricing), resource allocation (capacity deployment), anomaly detection (problem identification).

5

Section 05

Machine Learning Prediction Model Design

Prediction Objectives: Short-term regional demand prediction, weather impact quantification, abnormal travel pattern detection; Feature Engineering: Time features (hour/week/holiday), historical features (past demand trends), weather features (temperature/humidity/precipitation/weather conditions), geographic features (area code/POI density); Model Selection: Time series models (ARIMA/Prophet), ensemble learning (Random Forest/Gradient Boosting), deep learning (LSTM), etc.

6

Section 06

Practical Application Value of the Project

Taxi Companies: Optimize scheduling to reduce empty driving rate, balance supply and demand via dynamic pricing, improve service satisfaction; Urban Planning: Identify congestion hotspots to optimize road design, complement public transportation, emergency deployment for extreme weather; Passengers: Get recommendations for optimal travel time, price expectations, and waiting time estimates.

7

Section 07

Project Expansion and Open-Source Community Support

Expansion and Customization: Adapt to other cities (modify data sources/geocoding), customize models (add new features/algorithms), customize visualization (add new charts/dashboards); Open-Source Ecosystem: Transparent and modifiable code, GitHub community support, continuous updates, free to use—suitable for data engineers, traffic planners, and related developers to learn and apply.