Zing Forum

Reading

Automatic Fruit Quality Classification System: A Comparative Study Between Traditional Machine Learning and Deep Learning

An in-depth analysis of a computer vision-based automatic fruit quality classification system. This article details its complete technical workflow, including data preprocessing, feature extraction, YOLOv8 detection, and comparative experiments of three models (SVM, XGBoost, and CNN), and finally proposes a hybrid architecture scheme suitable for industrial scenarios.

计算机视觉水果分类机器学习深度学习CNNXGBoostSVM目标检测农业自动化品质检测
Published 2026-06-07 07:15Recent activity 2026-06-07 07:22Estimated read 7 min
Automatic Fruit Quality Classification System: A Comparative Study Between Traditional Machine Learning and Deep Learning
1

Section 01

Introduction to Automatic Fruit Quality Classification System: A Comparative Study Between Traditional Machine Learning and Deep Learning

This article introduces a computer vision-based automatic fruit quality classification system, aiming to automate fruit quality detection and grading in agricultural industrial scenarios. The system can identify the commercial quality grades (good/average/poor) and size categories (small/medium/large) of fruits. By comparing the performance of traditional machine learning models (SVM, XGBoost) and deep learning models (CNN), it finally proposes a hybrid architecture scheme suitable for industrial scenarios. The original author team of the project is karoldmejia et al., the source is GitHub, and the release date is 2026-06-06.

2

Section 02

Background: Project Objectives and Dataset Construction

Project Core Objectives: 1. Classify fruits into three commercial grades (good, average, poor) based on appearance features; 2. Divide into small/medium/large size categories based on pixel area; 3. Compare the performance of traditional ML and deep learning models; 4. Analyze the impact of geometric and color features on the models; 5. Propose an industrially feasible scheme. Dataset Situation: Contains images of six types of fruits: apples, bananas, guavas, lemons, oranges, and pomegranates, with a total of 36848 samples. Quality labels are manually annotated, and size labels are divided based on the normalized pixel area of fruits using type-specific percentiles (small <33%, medium 33-66%, large>66%) to ensure classification consistency.

3

Section 03

Methodology: Image Processing and Feature Extraction Workflow

Image Processing Stage: 1. YOLOv8 object detection to locate fruit regions; 2. HSV color space analysis to enhance contrast; 3. Contour detection to extract boundaries; 4. Watershed algorithm to handle overlapping fruits; 5. Filter invalid samples and uniformly adjust images to 224×224 pixels; 6. Apply data augmentation (rotation, flipping, brightness adjustment, etc.) to minority class samples. Feature Extraction: Geometric features (pixel area, aspect ratio, coverage rate) are used for size classification; color features (first-order statistics in HSV space such as mean/standard deviation, second-order statistics such as texture features and color consistency) are used for quality classification.

4

Section 04

Methodology: Detailed Model Architecture

Traditional Machine Learning Models:

  • SVM: A classic kernel method that uses geometric and color features to construct an optimal hyperplane for classification.
  • XGBoost: A gradient-boosted decision tree ensemble method that automatically captures non-linear interactions of features, has built-in regularization to prevent overfitting, and is suitable for tabular features. Deep Learning Models:
  • CNN: Directly learns hierarchical visual features from raw images through convolutional layers, pooling layers, and fully connected layers in an end-to-end manner, without the need for manually designed feature extractors.
5

Section 05

Evidence: Experimental Results and Performance Analysis

Quality Classification Results (macro-average F1-Score): SVM 0.9319, XGBoost 0.9494, CNN 0.9497 (best). Analysis: CNN excels at recognizing complex surface patterns (defects, textures), making it suitable for quality assessment. Size Classification Results: SVM 0.9590, XGBoost 0.9813 (best), CNN 0.9181. Analysis: XGBoost effectively utilizes explicit geometric features, making it suitable for size classification.

6

Section 06

Conclusion: Hybrid Architecture Scheme and Advantages

Core Findings: Different models have advantages in specific tasks—CNN is suitable for spatial information/texture-related tasks (quality), while traditional models are suitable for tasks involving explicit quantitative features (size). Hybrid Architecture: Input image → YOLOv8 detection → Feature extraction → Branch to XGBoost (size classification) and CNN (quality classification) → Comprehensive output. Advantages: Maximized accuracy (XGBoost size F1=0.9813, CNN quality F1=0.9497), optimized efficiency, interpretability (XGBoost feature importance), and independent module updates.

7

Section 07

Application Value and Technology Stack

Practical Applications: Automation of fruit sorting lines, quality grade determination, packaging specification allocation, yield statistics, and quality control. Economic Benefits: Reduce labor costs, improve consistency, accelerate processing speed, and reduce losses. Technology Stack: OpenCV (image processing), NumPy (numerical computation), Pandas (data processing), Scikit-Learn (SVM), XGBoost (gradient boosting), TensorFlow/Keras (CNN), Albumentations (data augmentation), YOLOv8 (object detection), Matplotlib (visualization).