Zing Forum

Reading

Predicting Dynamic Correlations Between Cryptocurrencies and Traditional Assets Using Machine Learning

A master's study from the University of Kragujevac in Serbia explores cross-market dependencies between Bitcoin and traditional assets such as stocks, precious metals, and the US dollar using rolling window correlation and various machine learning models, and finds that serial persistence is the core driver of prediction.

机器学习加密货币比特币跨市场依赖时间序列预测walk-forwardDCC-GARCH金融预测资产配置塞尔维亚
Published 2026-05-31 23:40Recent activity 2026-05-31 23:48Estimated read 6 min
Predicting Dynamic Correlations Between Cryptocurrencies and Traditional Assets Using Machine Learning
1

Section 01

[Introduction] Study on Predicting Dynamic Correlations Between Cryptocurrencies and Traditional Assets Using Machine Learning

The study by Bogdan Babaev (b0gdaan), a master's student in Artificial Intelligence at the Faculty of Engineering, University of Kragujevac in Serbia, explores cross-market dependencies between Bitcoin and traditional assets like stocks, precious metals, and the US dollar using rolling window correlation and various machine learning models. Adopting a walk-forward evaluation framework, it finds that serial persistence is the core driver of prediction. The research results are open-sourced on GitHub (https://github.com/b0gdaan/master-thesis) and were published on May 31, 2026.

2

Section 02

Research Background and Motivation

The relationship between cryptocurrencies and traditional financial markets is a focus of attention for investors and researchers. As the largest cryptocurrency, whether Bitcoin's correlation with stocks, precious metals, and the US dollar is stable is crucial for asset allocation and risk management. Traditional DCC-GARCH models are limited by rigid assumptions; this study attempts to use machine learning combined with a walk-forward framework to explore predictable cross-market dependency structures.

3

Section 03

Data Sources and Asset Selection

The study uses daily data from 2017 to 2026, covering 7 types of assets: cryptocurrencies (BTC-USD, ETH-USD) and traditional assets (S&P 500 ^GSPC, Nasdaq ^IXIC, Gold ETF GLD, Silver ETF SLV, US Dollar Index ETF UUP). All data are from Yahoo Finance, and rolling window Pearson correlation coefficients are calculated as the prediction target.

4

Section 04

Methodology Design

The target variable is rolling window correlation (14/30/60/90-day windows, Fisher-z transformation to stabilize variance); feature engineering extracts momentum, volatility, and return features; the model lineup includes benchmarks (Naive_Last, AR(1), HAR), machine learning models (ElasticNet, Ridge, Adaptive Ensemble, Random Forest, GBM, XGBoost), and econometric benchmarks (DCC-GARCH(1,1)); evaluation uses walk-forward expanding windows (minimum training set of 800 observations, refitting every 20 days) and Diebold-Mariano test (Newey-West correction).

5

Section 05

Key Findings

  1. Cross-market dependencies are predictable; Ridge, AR(1), and HAR have the best performance (RMSE: 0.0656-0.0659, R²: ~0.942-0.943), with serial persistence as the dominant factor; 2. All machine learning models significantly outperform DCC-GARCH (RMSE:0.2136);3. An investor signal layer is built to detect "stress days" of traditional assets, demonstrating practical application value.
6

Section 06

Practical Implications and Insights

For investors:1. Helps with diversified asset allocation;2. Provides time windows for risk hedging;3. Dynamically adjusts positions to deal with "stress days". For researchers:1. Simple linear models combined with feature engineering can achieve excellent performance;2. Walk-forward evaluation ensures reliable results;3. Combining econometrics and machine learning provides a comprehensive performance picture.

7

Section 07

Limitations and Future Directions

Limitations: Only focuses on linear correlation; the sample includes abnormal fluctuations from 2020-2021; does not include more alternative assets. Future directions: Explore nonlinear dependencies such as Copula; test LSTM/Transformer time-series models; use high-frequency data; expand causal inference.

8

Section 08

Highlights of Technical Implementation

  1. Reproducibility: Complete process code (main.py, run_all.py), parameter configuration (config.yaml), 7 Jupyter notebooks;2. Modular architecture: pipeline.py (core process), dcc.py (DCC benchmark), signal_layer.py (signal layer), etc.;3. Parallel computing: ThreadPoolExecutor for parallel experiments, supporting XGBoost GPU acceleration.