1. Data Acquisition and Preprocessing
Bitcoin price data usually includes fields such as Open, High, Low, Close, and Volume (OHLCV). The data processing steps involved in the project may include:
- Obtain historical K-line data from exchange APIs or public data sources
- Handle time series alignment and missing value filling
- Calculate logarithmic returns to stabilize the sequence
- Split into training, validation, and test sets (note the order of time series)
2. Feature Engineering Construction
Effective features are key to the success of machine learning models. In trend prediction tasks, common feature categories include:
Technical Indicator Features:
- Moving averages (SMA, EMA) and their cross signals
- Relative Strength Index (RSI) to judge overbought/oversold conditions
- MACD indicator to capture trend momentum
- Bollinger Bands to measure volatility
Price Behavior Features:
- Position of current price relative to recent highs and lows
- Candlestick pattern encoding (e.g., hammer, engulfing patterns)
- Volatility indicators (ATR, historical volatility)
Time Features:
- Periodic factors such as hour, week, and month
- Whether it is a holiday or major event window
3. Label Definition Strategy
The label design for trend prediction directly affects the model's learning objectives. Common practices include:
- Direction Prediction: Up/down direction in the next N cycles (binary classification problem)
- Amplitude Prediction: Discretized binning of future returns (multi-class classification problem)
- Signal Strength: Comprehensive score combining direction and confidence
The specific strategy adopted by the project needs to determine the optimal parameters based on backtesting performance.
4. Model Training and Parameter Tuning
Hyperparameter tuning of XGBoost is an important step to improve model performance:
| Parameter Category |
Key Parameters |
Tuning Suggestions |
| Tree Structure |
max_depth, min_child_weight |
Control single tree complexity to prevent overfitting |
| Regularization |
reg_alpha, reg_lambda |
Balance bias and variance |
| Learning Rate |
learning_rate, n_estimators |
Lower learning rate with more trees |
| Sampling |
subsample, colsample_bytree |
Row/column sampling to increase randomness |
Parameter tuning methods can use grid search, random search, or Bayesian optimization strategies.