Zing Forum

Reading

XGBoost Stock Selection System: A Quantitative Investment Strategy for S&P 500 Integrating Technical Indicators and Macroeconomic Variables

A machine learning stock selection system based on XGBoost that integrates technical indicators and macroeconomic variables to automate S&P 500 asset allocation. Backtesting from 2015 to 2025 shows a 70% return rate.

XGBoost量化投资选股系统S&P 500机器学习宏观经济技术指标回测资产配置
Published 2026-06-04 07:14Recent activity 2026-06-04 07:20Estimated read 6 min
XGBoost Stock Selection System: A Quantitative Investment Strategy for S&P 500 Integrating Technical Indicators and Macroeconomic Variables
1

Section 01

XGBoost Stock Selection System: Introduction to the S&P 500 Quantitative Strategy Integrating Technical Indicators and Macroeconomics

This project is a quantitative stock selection system based on XGBoost, integrating technical indicators (RSI, MACD, etc.) and macroeconomic variables (10-year Treasury yield, CPI, etc.) to automate S&P 500 asset allocation. Backtesting from 2015 to 2025 shows a 70% return rate with an excess return of +20% relative to the benchmark. Key innovations include a three-class prediction framework (introducing cash allocation), a Gatekeeper confidence filtering mechanism, which can dynamically adapt to extreme market environments and has practical reference value.

2

Section 02

Project Background and Motivation

In the field of quantitative investment, traditional technical analysis struggles to cope with macroeconomic upheavals, while macroeconomic analysis lacks precise entry timing. This project, as a master's thesis in data science from a Spanish university, proposes a hybrid approach: using XGBoost as the core, integrating technical and macro variables to build a stock selection system that dynamically adapts to market conditions. It covers ten years of S&P 500 data from 2015 to 2025, experiencing extreme environments such as the pandemic crash, quantitative easing, and interest rate hike cycles.

3

Section 03

Core Architecture and Innovative Design

  1. Three-class prediction framework: Buy (Top10), Hold, Cash—introducing the cash option to handle high-risk periods;
  2. Feature engineering: Technical indicators (RSI, MACD, SMA200 distance) + macro variables (10-year Treasury yield, CPI, yield curve, VIX);
  3. Time series alignment: Forward filling and lag processing to avoid look-ahead bias;
  4. Gatekeeper mechanism: A 45% confidence threshold to filter marginal predictions, reducing transaction costs and improving win rate.
4

Section 04

Backtest Results and Analysis

The backtest engine models transaction commissions (0.25%), slippage, and rebalancing cycles. Key indicators from 2015 to 2025: AI portfolio return rate of 70.04%, excess return of +20% relative to the benchmark, and dynamic cash allocation that automatically reduces positions during high volatility periods. It outperforms the traditional buy-and-hold strategy in extreme market phases (e.g., 2020 pandemic circuit breaker, 2022 interest rate hike bear market).

5

Section 05

Interpretability and Reproducibility Guide

Feature contributions are revealed through SHAP analysis: macro factors (VIX, yield curve) dominate during economic turning points, while technical factors (RSI, MACD) are more important in bull markets. The code structure integrates the entire process in Jupyter Notebook, including a data directory and dependency list (XGBoost, SHAP, etc.), making it easy to reproduce.

6

Section 06

Limitations and Risk Warnings

The following limitations should be noted:

  1. Historical backtesting ≠ future performance; the model may overfit to specific market regimes;
  2. Survivorship bias (ignoring companies removed from the index);
  3. Liquidity assumptions (small-cap stocks or extreme markets may have insufficient liquidity;
  4. Look-ahead bias introduced by macro data revisions;
  5. Overfitting risk (hyperparameter tuning of a single model may lead to over-optimization within the sample).
7

Section 07

Adaptation to Chinese Markets and Summary Insights

Adaptation to Chinese Markets: Replace macro variables (China's 10-year Treasury yield, PPI, etc.), adjust trading rules (T+1, price limits), ensure data timestamp alignment, and select CSI 300/CSI 500 components. Summary: The project demonstrates a complete quantitative investment pipeline, with core value in the integration of macro and technical features and dynamic risk management, which is of reference significance to both learners and practitioners.