The project uses three classic machine learning algorithms: logistic regression, random forests, and gradient boosting. This multi-model comparison strategy helps understand the applicable scenarios and performance characteristics of different algorithms.
Logistic regression as a baseline model has the advantages of fast training speed and strong interpretability. By analyzing feature coefficients, we can intuitively understand which factors have the greatest impact on customer satisfaction. Although the expressive power of linear models is limited, they can often achieve good baseline performance when feature engineering is sufficient.
Random forest is an ensemble learning method that effectively reduces the overfitting risk of a single tree by building multiple decision trees and integrating their prediction results. The advantage of tree models is that they can automatically capture nonlinear interactions between features and are relatively robust to outliers. In tabular data tasks like customer satisfaction prediction, random forests usually perform well.
Gradient Boosting Trees (such as XGBoost and LightGBM) are the mainstream algorithms in current Kaggle competitions. It uses a serial training method, where each new tree focuses on correcting the prediction errors of previous trees, thereby gradually improving the overall performance. Gradient boosting has relatively low dependence on feature engineering, but it requires careful tuning of hyperparameters such as learning rate, tree depth, and regularization parameters.