Zing Forum

Reading

SeismicID: Indonesia Earthquake Probability Prediction System Integrating Physical Features and Machine Learning

SeismicID is an Indonesia earthquake probability prediction system based on FastAPI, integrating USGS and BMKG data. It uses an ensemble model of XGBoost and LightGBM, combined with physical features such as fault distance and plate depth, to provide earthquake risk predictions for multiple time windows (7/14/30/60 days).

地震预测机器学习FastAPIXGBoostLightGBM概率模型印尼USGSBMKG物理特征
Published 2026-05-30 07:14Recent activity 2026-05-30 07:18Estimated read 8 min
SeismicID: Indonesia Earthquake Probability Prediction System Integrating Physical Features and Machine Learning
1

Section 01

SeismicID: Core Guide to Indonesia Earthquake Probability Prediction System

SeismicID is an Indonesia earthquake probability prediction system developed based on FastAPI, integrating USGS and BMKG data. It adopts an ensemble model of XGBoost and LightGBM, combining physical features (such as fault distance and plate depth) and seismological features to provide earthquake risk probability predictions for multiple time windows (7/14/30/60 days). Maintained by erzanugroho and open-sourced on GitHub, the project aims to assist in identifying high-risk areas through data science and machine learning, emphasizing scientific rigor and accuracy in probability interpretation.

2

Section 02

Project Background and Development Motivation

Indonesia is located at the intersection of the Pacific Ring of Fire and the Alpine-Himalayan seismic belt, with frequent and harmful earthquake activities. Traditional early warning relies on real-time monitoring, while SeismicID focuses on probability prediction to help identify high-risk areas in the future. Developed by erzanugroho, the project uses FastAPI to build the backend, integrates USGS (global data) and BMKG (Indonesian local data), and focuses on model calibration, probability interpretation, and uncertainty quantification to enhance scientific rigor.

3

Section 03

System Architecture and Core Technical Details

Data Layer: Uses Parquet (historical data), SQLite (runtime data), and Geo data (administrative divisions and subduction zones); data sources are USGS ComCat (main) and BMKG TEWS (supplement), with deduplication and cleaning processes. Spatial Model: Indonesia is divided into 0.5°×0.5° grids (about 3000 cells), linked to provinces and subregions. Feature Engineering: Over 25 features, including seismological features (multi-window b-value, Mc estimation, event interval, etc.) and physical features (fault distance/type/slip rate, subduction zone depth, etc.)—physical features are an important innovation, integrating geological knowledge to improve interpretability.

4

Section 04

Machine Learning Model Design and Calibration

Multi-task Framework: Simultaneously predicts 4 time windows (7/14/30/60 days) × 4 magnitude thresholds (M≥4.5/5.0/5.5/6.0), generating 16 probability results per grid. Ensemble Strategy: Three-layer integration—base layer (XGBoost+LightGBM), baseline layer (Poisson process/ETAS-Ogata), fusion layer (weighted average + Bayesian fusion), combining nonlinear fitting and physical interpretability. Calibration and Evaluation: Uses Platt scaling, isotonic regression, and Beta calibration (selecting the optimal Brier score); evaluation metrics include ROC-AUC, Brier score, reliability diagrams, etc., to ensure the frequency meaning of probability outputs.

5

Section 05

System Deployment and Operation Mechanism

Microservice Architecture: Dual-service mode—Web service (public API + UI, read-only cache to ensure high concurrency), Worker/Cron service (data acquisition, model inference, prediction update, shared data volume). Auto-update: Checks for new events every 10 minutes, uses 5-minute debounce to prevent duplicate computation, and 3-hour fallback recalculation to ensure data freshness. Degradation Strategy: Three-level degradation (ML ensemble → ETAS-Ogata → Poisson baseline → physical-aware demo data) to ensure available information during failures.

6

Section 06

Scientific Rigor and Limitations Explanation

SeismicID clearly states that its output is probability prediction for relative risk ranking, not deterministic early warning: low probability ≠ safe, high probability ≠ certain occurrence; decisions should refer to official warnings such as BMKG. The project has been improved through scientific audits, including multi-window b-value calculation, Mc estimation, Reasenberg declustering algorithm, etc., to ensure methodological rationality.

7

Section 07

Practical Application Value and Scenarios

SeismicID provides auxiliary tools for multiple scenarios:

  • Government: Identify areas needing enhanced monitoring and emergency preparedness;
  • Insurance companies: Reference for earthquake risk pricing and reserve assessment;
  • Research institutions: Seismological hypothesis generation and data exploration;
  • Public education: Improve understanding of earthquake probability and risk.
8

Section 08

Technical Highlights and Project Insights

Technical Highlights: 1. Integration of physical information features (domain knowledge + ML to improve interpretability); 2. Probability calibration (ensure output accuracy); 3. Multi-level degradation (system reliability); 4. Scientific transparency (model cards, interpretation guides, limitation explanations). Insights: Natural disaster prediction projects should emphasize domain knowledge integration, probability calibration, system reliability design, and honest limitation explanations, providing reference for the responsible application of AI in science and public safety fields.