Section 01
Introduction to the Comparative Study of CNN and Vision Transformer for Fruit Image Classification
This study is a systematic comparative project of deep learning models, focusing on fruit image classification tasks to evaluate the performance differences between traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), covering key techniques such as data augmentation, transfer learning, and model fine-tuning. It aims to answer core questions including whether ViTs can outperform CNNs on small and medium-sized datasets, the differential impacts of data augmentation and fine-tuning on different architectures, the trade-offs between training efficiency, inference speed, and final accuracy of the two models, and the application effect of transfer learning on ViTs, providing empirical evidence for model selection.