The toolkit adopts a modular design, with clear data flow interfaces connecting components. Core modules include:
Data Acquisition Layer: ogsdownloader.py is implemented based on ObsPy's MassDownloader, supporting batch download of waveform data from multiple FDSN data centers such as INGV, GFZ, IRIS, ETH, ORFEUS. It supports rectangular or circular geographic area selection, automatic directory storage by date, and EIDA token authentication for restricted data access.
Format Parsing Layer: For the four proprietary formats (.dat, .hpl, .pun, .txt) historically used by OGS, dedicated parsers are implemented respectively, exposing a consistent interface through the unified OGSDataFile abstract base class. The parsed data is converted into the standard Pandas DataFrame format for subsequent analysis.
Catalog Management Core: ogscatalog.py is the heart of the entire toolkit, providing advanced functions such as lazy loading, geofence filtering, and Parquet partition storage. It supports efficient filtering by date range and geographic polygon, with built-in visualization methods including event distribution maps, cumulative curves, magnitude histograms, etc.
Clustering Analysis Engine: ogsclustering.py implements 13 clustering algorithms, including K-Means, MiniBatchKMeans, BisectingKMeans, DBSCAN, HDBSCAN, OPTICS, Advanced Density Peaks, Hierarchical Clustering, Feature Hierarchical Clustering, Affinity Propagation, Mean Shift, Spectral Clustering, and Birch. Each algorithm is equipped with a hyperparameter optimization mechanism and performance comparison through a unified evaluation index interface (Silhouette Coefficient, Calinski-Harabasz Index, Davies-Bouldin Index, etc.).
Catalog Comparison System: ogscompare.py implements a catalog comparison framework based on the Bipartite Graph Matching Algorithm (BGMA), supporting event and phase matching between two catalogs within time and space tolerance ranges, generating confusion matrices and true positive/false negative/false positive statistics.