Zing Forum

Reading

From Scratch Reproduction of VGG16 and GoogLeNet: A Complete PyTorch CNN Learning Practice Record

A project documenting how a deep learning beginner manually reproduces classic CNN models VGG16 and GoogLeNet from scratch, covering the complete practice path from model principle understanding, network structure construction, FashionMNIST training and parameter tuning to transfer application on the Cats vs. Dogs dataset.

PyTorchVGG16GoogLeNet卷积神经网络深度学习迁移学习CNN复现FashionMNIST图像分类机器学习实践
Published 2026-06-03 12:09Recent activity 2026-06-03 12:20Estimated read 7 min
From Scratch Reproduction of VGG16 and GoogLeNet: A Complete PyTorch CNN Learning Practice Record
1

Section 01

Project Introduction: From Scratch Reproduction of VGG16 and GoogLeNet in PyTorch Practice Record

This project records the complete path of a deep learning beginner manually reproducing classic CNN models VGG16 and GoogLeNet from scratch, covering model principle understanding, network structure construction, FashionMNIST training and parameter tuning, and transfer application on the Cats vs. Dogs dataset. The project is from the GitHub repository PyTorch-CNN-2 (author: yhe8479-ship-it, release date: June 3, 2026), aiming to deeply understand the model design ideas and engineering implementation details through manual implementation rather than calling ready-made APIs.

2

Section 02

Why Manually Reproduce Classic Models?

Directly calling ready-made model APIs makes it difficult to understand internal mechanisms. The value of manual reproduction includes:

  1. Deeply understand design ideas: such as the design purpose of VGG16's small convolution kernel stacking and GoogLeNet's Inception multi-branch structure.
  2. Cultivate engineering implementation ability: convert paper structures into PyTorch code, handle details like size calculation and channel matching.
  3. Build debugging intuition: only by understanding the internal structure can you quickly locate training problems.
3

Section 03

Core Design and Implementation Details of VGG16 and GoogLeNet

VGG16:

  • Core: Use multiple 3×3 convolution kernel stacks to replace large convolutions (e.g., two 3×3 kernels are equivalent to a 5×5 receptive field, halving the number of parameters and increasing non-linearity).
  • Structure: 13 convolutional layers + 3 fully connected layers, divided into 5 Blocks. Input 224×224 becomes 7×7×512 features after 5 MaxPool operations.
  • Initialization: Kaiming initialization for convolutional layers, normal initialization for fully connected layers, bias set to 0.

GoogLeNet:

  • Core: Inception module (multi-branch parallel: 1×1 Conv, 1×1→3×3, 1×1→5×5, MaxPool→1×1) to achieve multi-scale feature extraction.
  • Role of 1×1 convolution: Increase non-linearity, reduce dimensionality to compress parameters, and cross-channel fusion.
  • Difficulty: Ensure the output size of each branch is consistent for concatenation.
4

Section 04

Practice Process from FashionMNIST to Cats vs. Dogs Classification

FashionMNIST Training:

  • Dataset: 10 categories of clothing images, preprocessing (resize, tensor conversion), split into training/validation/test sets.
  • Process: DataLoader loading → model forward propagation → CrossEntropyLoss → Adam optimization → save the best model.
  • Parameter tuning strategy (resource-constrained): Adjust batch size, learning rate, reduce the number of epochs, choose Adam optimizer.

Cats vs. Dogs Dataset Transfer:

  • Steps: Split dataset → calculate mean/std normalization → modify fully connected layer output to 2 classes → retrain → test evaluation → single image prediction demonstration.
5

Section 05

Ability Improvements Brought by the Project

Abilities cultivated through the project:

  1. Model structure understanding: Manually build VGG Block and Inception module to understand internal principles.
  2. Experiment debugging and optimization: Solve computing power/memory issues in VGG16 training, adjust hyperparameters.
  3. Transfer application: Extend from FashionMNIST to Cats vs. Dogs binary classification task.
  4. Engineering organization: Structurally organize resources like code, papers, PPTs.
  5. Scientific research expression: Write README and PPT to improve technical writing skills.
6

Section 06

Advice for Deep Learning Beginners

  1. Don't rush to call ready-made models: Manual implementation of classic models brings greater gains.
  2. Value mathematical foundations: Understand the mathematical principles of convolution, pooling, and backpropagation.
  3. Start with simple datasets: FashionMNIST is suitable for quick model verification.
  4. Record the learning process: Write README and make PPTs; output promotes learning.
  5. Dare to apply transfer learning: After verifying on standard datasets, try your own data.
7

Section 07

Project Conclusion: Learning Value of Classic Models

The value of this project lies in showing the correct learning path: from theory to code, from standard datasets to practical applications, from parameter tuning to summary. For beginners, mastering classic models (such as VGG16 and GoogLeNet) is more important than pursuing SOTA; their design ideas still influence the development of deep learning today. As the author said: "The value of classic CNNs is not just accuracy, but more about the design ideas behind the structure and how to write trainable and transferable code."