Zing Forum

Reading

StormSense: A Practical Implementation of a Random Forest-Based Weather Prediction Model

A machine learning project focused on solving the class imbalance problem in weather prediction, which improves the prediction accuracy of rare weather conditions (such as foggy days) through feature engineering and SMOTE technology.

StormSense天气预测随机森林类别不平衡SMOTE时间序列机器学习特征工程Scikit-Learn
Published 2026-06-17 09:13Recent activity 2026-06-17 09:24Estimated read 4 min
StormSense: A Practical Implementation of a Random Forest-Based Weather Prediction Model
1

Section 01

StormSense Project Guide: A Practical Solution to Class Imbalance in Weather Prediction

StormSense is a random forest-based weather prediction model developed by Tyler Lewinski, focusing on solving the class imbalance problem in weather data (such as rare weather like foggy days). Through feature engineering (time-series rolling statistics), SMOTE technology, and time-aware training-test splitting, it improves the prediction accuracy of rare weather conditions, providing more practical prediction support for fields like aviation and transportation.

2

Section 02

Project Background: The Dilemma of Class Imbalance in Weather Prediction

Weather data usually shows a long-tail distribution (rainy/sunny days are common, foggy days are rare). Although naive models have high overall accuracy, their ability to predict rare weather is poor. These rare weather conditions (like foggy days) are crucial for scenarios such as aviation and transportation. The StormSense project directly addresses this challenge, not pursuing falsely high overall accuracy but focusing on improving the prediction performance of rare classes.

3

Section 03

Technical Implementation: Combination of Feature Engineering, SMOTE, and Random Forest

  1. Data preprocessing: Parse dates, extract time features (month, day), encode weather labels;
  2. Feature engineering: 14-day rolling average/standard deviation to capture weather trends;
  3. SMOTE technology: Synthesize foggy day samples to balance classes;
  4. Model selection: Random Forest (tuned hyperparameters like n_estimators=500-700, max_depth=12);
  5. Time-aware splitting: Use pre-2015 data for training set and post-2015 data for test set to avoid leakage.
4

Section 04

Experimental Results: Model Performance and Prediction of Rare Classes

The overall accuracy of the test set is 84% (including foggy days), and it reaches 94% when excluding foggy days. F1 scores for each class: rainy days 0.97, sunny days 0.84, foggy days 0.38. The low F1 score for foggy days is due to the scarcity of real samples; SMOTE helps but cannot create new patterns.

5

Section 05

Key Lessons and Project Summary

  1. Feature engineering (time rolling features) improves prediction quality;
  2. SMOTE alleviates imbalance but is not a panacea;
  3. Time-aware splitting ensures honest evaluation;
  4. The project demonstrates the construction of an effective prediction system under real-world constraints, providing important insights for machine learning learners (understanding data distribution, value of feature engineering, etc.).
6

Section 06

Limitations and Improvement Directions

Current limitations: Only applicable to Seattle, single features, foggy day prediction needs improvement; Improvement directions: Integrate satellite/radar data, try XGBoost/deep learning, use variants like Borderline-SMOTE, multi-model integration.