Section 01
[Introduction] Core Overview of Parameter-Matched Comparative Study of CNN, ViT, and CCT in Data-Scarce Scenarios
This study was completed by a student team from Bocconi University. It aims to systematically compare the performance of three architectures (CNN, ViT, and CCT) on the CIFAR-10 dataset under different data volumes (10%-100%) and augmentation strategies, with strict control over the number of parameters (two scales: ~0.75M and ~5M). Key findings include: CNN has a significant advantage in low-data scenarios; ViT needs sufficient data to perform well; CCT shows stable performance; and the "low-data augmentation crossover" phenomenon—CNN benefits more from augmentation in low-data scenarios, while ViT benefits more in high-data scenarios. The study provides practical guidelines for architecture selection under different data scenarios.