Section 01
[Introduction] End-to-End Machine Learning Practice for Diabetes Prediction Based on Lifestyle Indicators
This project is a complete data science practice using the U.S. CDC's BRFSS 2015 health survey data. It combines unsupervised clustering (K-Means, Gaussian Mixture Model) and gradient boosting models (XGBoost, LightGBM), and uses SMOTE technology to address class imbalance issues, achieving high recall diabetes risk prediction. It covers the entire workflow of data cleaning, feature engineering, model training, and evaluation.