The quality of financial data directly affects the predictive ability of the model. This project uses a variety of statistical techniques to ensure the reliability of input data:
Winsorization: This is a method for handling outliers by limiting extreme values to a certain percentile range, reducing the impact of abnormal data on the model while preserving the overall distribution characteristics of the data.
Augmented Dickey-Fuller (ADF) Test: Used to detect the stationarity of time series. Financial time series often have unit root characteristics, and the ADF test can help identify whether the data needs to be differenced to meet the assumptions of many statistical models.
The automated execution of these preprocessing steps ensures that the data has reached a high quality standard before entering the model training phase.