Zing Forum

Reading

Daily Stress Level Prediction System Based on Two-Tier Stacking Ensemble Learning

This article introduces a machine learning project that predicts users' stress levels using a two-tier stacking ensemble architecture (XGBoost, Random Forest, SVR, and Ridge Regression), based on 55,000 samples and 18 life behavior features.

机器学习集成学习压力预测XGBoost随机森林SVR岭回归健康科技监督学习Stacking Ensemble
Published 2026-06-07 20:15Recent activity 2026-06-07 20:24Estimated read 7 min
Daily Stress Level Prediction System Based on Two-Tier Stacking Ensemble Learning
1

Section 01

[Introduction] Core Overview of the Daily Stress Level Prediction System Based on Two-Tier Stacking Ensemble Learning

This project is a supervised learning solution for daily stress prediction developed by a student team from Hanoi University of Science and Technology. Its core innovation is the two-tier stacking ensemble architecture (2-Tier Stacking Ensemble). Using 55,000 samples and 18 life behavior features, the project combines base models such as XGBoost, Random Forest, and SVR with a Ridge Regression meta-model to achieve accurate stress prediction, lower the threshold for data collection, and align with ordinary life scenarios.

2

Section 02

Project Background and Dataset Features

  • Original Author: Ktin06
  • Source: GitHub Project IntroML (Link: https://github.com/Ktin06/IntroML)
  • Course Background: IT3190 Introduction to Machine Learning and Data Mining at Hanoi University of Science and Technology
  • Dataset: 55,000 samples with 18 features, divided into physiological indicators (sleep duration/quality, steps/activity level, calorie expenditure/intake) and lifestyle habits (caffeine intake, working hours/intensity, exercise frequency).
3

Section 03

Detailed Explanation of the Two-Tier Stacking Ensemble Architecture

Tier 1 (Base Model Layer):

  1. XGBoost: Gradient boosting framework that captures non-linear relationships;
  2. Random Forest: Bagging method, uses multi-tree voting to reduce variance;
  3. SVR: Maps to high-dimensional space via kernel functions to find the optimal hyperplane for regression. Tier 2 (Meta Model Layer): Ridge Regression is used as the meta-learner, dynamically learning the optimal weights of base models, preventing overfitting through L2 regularization, and compensating for individual model errors to generate the final prediction.
4

Section 04

Technology Stack and Team Collaboration Division

Technology Stack:

  • ML Frameworks: Scikit-learn (algorithm interfaces/evaluation), XGBoost (gradient boosting);
  • Data Processing: Pandas (structured processing), NumPy (numerical computation);
  • Visualization: Matplotlib & Seaborn;
  • Deployment: Streamlit/Gradio (interactive web applications). Team Division:
  • Leader/Data Engineer: EDA, preprocessing, missing value handling, version control;
  • ML Engineer 1: Train-test split, K-fold cross-validation, base model tuning;
  • ML Engineer 2: Meta-feature extraction, Ridge Regression configuration, performance comparison and evaluation;
  • Full-stack/UI Engineer: Model serialization, web application development, presentation design.
5

Section 05

Model Evaluation and Performance Validation

The project uses multi-dimensional regression metrics for evaluation:

  • RMSE (Root Mean Squared Error): Measures the average deviation between predictions and true values;
  • MAE (Mean Absolute Error): Intuitive average prediction error;
  • R² (Coefficient of Determination): Evaluates the model's ability to explain data variability. By comparing baseline models with the stacking ensemble model, the superiority of the two-tier architecture in stress prediction tasks is verified.
6

Section 06

Project Application Value and Methodological Insights

  1. Health Value of Daily Data: Without professional medical equipment, daily behavior data can effectively assess stress, providing a low-cost path for public health management;
  2. Practical Paradigm for Ensemble Learning: The two-tier stacking architecture demonstrates the idea of organically combining different algorithms, which can be transferred to other prediction tasks;
  3. Balance Between Academia and Engineering: The course project balances theoretical rigor and engineering implementation, providing a fully operational system.
7

Section 07

Conclusion and Reference Suggestions

The IntroML project is an excellent machine learning course practice case, applying supervised learning and ensemble learning knowledge to health prediction scenarios. Its architecture design, feature engineering, and team collaboration model are worth referencing. For developers learning ML project practice, this project has a clear code structure and complete documentation, making it an open-source resource worth in-depth study.