First Stage: U.S. Data Preparation
Obtain 2023 county-level outage events (15-minute resolution, 26 million rows) from the U.S. Department of Energy's EAGLE-I dataset, and get matching hourly weather data via the Open-Meteo Archive API. Preprocessing steps include: filtering 6 U.S. states with similar climates (e.g., Texas), acquiring weather data for 20 cities, fusing data using Haversine distance matching, and engineering 13 initial features (v1).
Second Stage: Model Training and Optimization
XGBoost is used (efficient for tabular data, built-in feature importance, supports cost-sensitive learning). Training uses a cost-sensitive strategy (scale_pos_weight=6.37) with two iterations:
- v1 model: 13 features (heat index, season markers, etc.)
- v2 model: 13 additional features (gusts, rolling temperature, etc.), validated effective by MRMR
v2 model performance: accuracy 74.4%, recall 51.6%, precision 27.0%, F1=0.354. Model tuning prioritizes high recall (reducing missed alerts) at the cost of low precision (more false alerts).