Zing Forum

Reading

Deep Learning vs. Traditional Machine Learning: A Comprehensive Comparison Between PyTorch and Scikit-Learn on Wine Classification Task

This article provides an in-depth analysis of an open-source project that systematically compares PyTorch neural networks and Scikit-Learn random forests on the UCI Wine Dataset, revealing performance differences and applicable scenarios of the two methodologies across different dataset sizes.

PyTorchScikit-Learn机器学习对比深度学习随机森林神经网络分类任务UCI数据集
Published 2026-05-31 15:16Recent activity 2026-05-31 15:22Estimated read 9 min
Deep Learning vs. Traditional Machine Learning: A Comprehensive Comparison Between PyTorch and Scikit-Learn on Wine Classification Task
1

Section 01

Introduction / Main Floor: Deep Learning vs. Traditional Machine Learning: A Comprehensive Comparison Between PyTorch and Scikit-Learn on Wine Classification Task

This article provides an in-depth analysis of an open-source project that systematically compares PyTorch neural networks and Scikit-Learn random forests on the UCI Wine Dataset, revealing performance differences and applicable scenarios of the two methodologies across different dataset sizes.

3

Section 03

Introduction: When Deep Learning Meets Traditional Methods

In the field of machine learning, the explosive growth of deep learning in recent years has given many people the impression that neural networks seem to be replacing all traditional algorithms. However, is this view accurate? In scenarios with small-scale datasets and structured features, do traditional machine learning methods still remain competitive?

This article will conduct an in-depth analysis of an open-source project from GitHub, which performs a rigorous comparative experiment between PyTorch neural networks and Scikit-Learn random forests on the classic UCI Wine Classification Dataset. The experimental results may surprise some readers—traditional methods show impressive advantages in this specific scenario.


4

Section 04

Dataset Background: UCI Wine Classification Dataset

The UCI Wine Dataset is a classic benchmark dataset in machine learning teaching and research, derived from the chemical analysis results of three different cultivars of wine grown in the same region of Italy. The dataset has the following characteristics:

  • Number of Samples: 178 records
  • Feature Dimensions: 13 continuous chemical features (including alcohol content, malic acid, ash, alkalinity of ash, magnesium content, total phenols, flavonoids, non-flavonoid phenols, proanthocyanidins, color intensity, hue, OD280/OD315 ratio of diluted wine, proline)
  • Classification Target: 3 wine categories
  • Data Characteristics: All features are numerical, and the class distribution is relatively balanced

This dataset is not large, but its clear chemical correlation between features and targets makes it an ideal testbed for evaluating the performance of classification algorithms.


5

Section 05

PyTorch Neural Network Architecture

The PyTorch model used in the project is a feedforward neural network, with the following architectural design:

  • Input Layer: 13 neurons (corresponding to the 13 features)
  • Hidden Layer 1: 9 neurons, using ReLU activation function
  • Hidden Layer 2: 10 neurons, using ReLU activation function
  • Output Layer: 3 neurons (corresponding to the 3 categories), using Softmax activation

This architecture is a typical Multi-Layer Perceptron (MLP); although not particularly complex, it is sufficient to capture the non-linear relationships between features. The model training follows a standard supervised learning process, including data standardization (StandardScaler), training/test set split (80/20 ratio), and appropriate hyperparameter tuning.

6

Section 06

Scikit-Learn Random Forest

As the traditional machine learning model for comparison, the project selected the Random Forest Classifier—an ensemble learning method that improves generalization ability by building multiple decision trees and aggregating their prediction results. The specific configuration is as follows:

  • Number of Base Learners: 100 decision trees
  • Feature Sampling Strategy: Default random subset selection
  • Voting Mechanism: Majority voting

The advantages of random forests lie in their natural ability to handle high-dimensional data, automatic evaluation of feature importance, and relatively low need for hyperparameter tuning.


7

Section 07

Comparison of Core Performance Metrics

The experimental results present a clear picture:

Metric PyTorch Neural Network Scikit-Learn Random Forest Winner
Accuracy 94.44% 100.00% Scikit-Learn
Precision (Macro Average) 94.44% 100.00% Scikit-Learn
Recall (Macro Average) 94.44% 100.00% Scikit-Learn
Training Time ~1.2 seconds ~0.1 seconds Scikit-Learn
Inference Time ~0.01 seconds ~0.005 seconds PyTorch (slight advantage)
Model Size ~2 KB ~50 KB PyTorch
8

Section 08

Interpretation of Results

Complete Victory in Accuracy: Scikit-Learn achieved a perfect 100% classification accuracy on this dataset, meaning all test samples were correctly classified. In contrast, PyTorch's 94.44% is excellent but still has a few misclassified samples.

Huge Gap in Training Efficiency: Random Forest takes only about 0.1 seconds to complete training, while the PyTorch model takes about 1.2 seconds—a 12-fold difference. This gap is particularly obvious in small-scale datasets, as neural networks require more computational iterations for backpropagation and parameter optimization.

Reversal in Model Size: Interestingly, the trained PyTorch model file is only about 2KB, while the Random Forest model is about 50KB. This reflects the essential difference between the two methods: neural networks compress knowledge through weight matrices, while Random Forests need to store the complete structure of multiple decision trees.