The project may implement multiple classification algorithms for comparison:
Logistic Regression
As a representative of linear classifiers, logistic regression assumes a linear relationship between features and log odds. It is simple, highly interpretable, and the first choice for establishing a performance baseline.
K-Nearest Neighbors (KNN)
An instance-based learning method that classifies samples by calculating the distance to the K nearest neighbors in the training set. The choice of K value significantly affects performance.
Support Vector Machine (SVM)
A method that finds the optimal decision boundary (hyperplane). It can handle non-linearly separable problems through kernel tricks. For relatively simple problems like Iris classification, a linear kernel usually achieves good results.
Decision Tree and Random Forest
Decision trees build classification rules by recursively partitioning the feature space. Random forests improve generalization ability by integrating multiple decision trees. The advantages of tree models are interpretability and insensitivity to feature scaling.
Naive Bayes
A probabilistic classifier based on Bayes' theorem, assuming that features are independent of each other. Although the assumption is usually not valid, it performs surprisingly well in many problems.