Zing Forum

Reading

From Perceptron to Convolutional Neural Network: The Evolution of MNIST Handwritten Digit Recognition

This article deeply compares the performance of three deep learning models on the MNIST handwritten digit recognition task—from single-layer perceptron to multi-layer neural network and then to convolutional neural network—showing how model complexity affects image classification performance.

深度学习卷积神经网络MNIST图像识别感知机神经网络TensorFlowKeras机器学习
Published 2026-06-09 12:45Recent activity 2026-06-09 12:49Estimated read 7 min
From Perceptron to Convolutional Neural Network: The Evolution of MNIST Handwritten Digit Recognition
1

Section 01

From Perceptron to CNN: The Evolution of MNIST Handwritten Digit Recognition (Introduction)

Title: From Perceptron to Convolutional Neural Network: The Evolution of MNIST Handwritten Digit Recognition

Original Author/Maintainer: GitForTiger Source Platform: GitHub Original Project Title: MNIST-classification-perceptron-vs-ANN-vs-CNN Original Link: https://github.com/GitForTiger/MNIST-classification-perceptron-vs-ANN-vs-CNN Publication Time: June 9, 2026

This article deeply compares the performance of three deep learning models—single-layer perceptron, multi-layer artificial neural network (ANN), and convolutional neural network (CNN)—on the MNIST handwritten digit recognition task, demonstrating how model complexity and architectural innovation drive improvements in image classification performance.

2

Section 02

Background: MNIST Dataset and Handwritten Digit Recognition Challenges

The MNIST dataset is one of the most famous benchmark datasets in the machine learning field, containing 70,000 28×28 pixel grayscale images of handwritten digits (10 classes from 0 to 9). Since its release in 1998, it has been the gold standard for testing new algorithms and model architectures. Handwritten digit recognition is a classic challenge; variations in stroke positions and shapes from different writers pose tests for models.

3

Section 03

Data Preprocessing: Preparing for Model Training

Key steps in data preprocessing:

  1. Normalization: Convert pixel values from the range 0-255 to 0-1 to improve training stability and convergence speed.
  2. Data Reshaping: Perceptron/ANN require flattening into a 28×28 vector, while CNN retain the 28×28×1 3D tensor to preserve spatial structure.
  3. Label One-Hot Encoding: Convert digital categories into classification vectors to facilitate training with cross-entropy loss function.
4

Section 04

Model Architecture and Performance

Single-Layer Perceptron

Architecture: Flatten layer + 10-neuron fully connected layer (Softmax activation). Trained with SGD optimizer and categorical cross-entropy; test accuracy is 90.97%. Key observation: Linear classifier, difficult to capture non-linear relationships and spatial structures.

Multi-Layer Neural Network

Architecture: Flatten → 128-neuron ReLU hidden layer → 64-neuron layer → 10-neuron Softmax output. Trained with Adam optimizer; accuracy is 97.78%. Key observation: Non-linear activation and hidden layers enable hierarchical feature learning.

Convolutional Neural Network

Architecture: Conv2D(32) → MaxPool → Conv2D(64) → MaxPool → Flatten → 128-neuron layer (Dropout 0.5) → 10-neuron output. Accuracy is 99.29%. Key observation: Convolution operations automatically extract spatial features; weight sharing and local connectivity enhance translation invariance.

5

Section 05

Performance Comparison and In-depth Analysis

Comparison of key metrics for the three models:

Metric Perceptron ANN CNN
Test Accuracy 90.97% 97.78% 99.29%
Learning Type Linear Non-linear Spatial Feature Learning
Complexity Low Medium High
Feature Extraction Manual/None Learned Auto-extracted
Performance on Image Data Medium Excellent Outstanding

The accuracy increased by 8.32 percentage points, and the error rate dropped from 9.03% to 0.71% (a reduction of about 13 times). Parameter counts: Perceptron ~7850, ANN ~110,000, CNN has reasonable parameter counts due to weight sharing.

6

Section 06

Visualization Insights: Understanding the Model Learning Process

Visualization tools help understand models:

  1. Training Curves: Observe the synchronization of training/validation accuracy to judge overfitting.
  2. Loss Curves: Monitor training/validation loss trends to identify overfitting signals.
  3. Confusion Matrix: Show pairs of digits that are easily misclassified (e.g., 3&8, 4&9).
  4. Sample Prediction Comparison: Intuitively感受 the decision differences between models and understand CNN's detail capture ability.
7

Section 07

Practical Insights and Future Directions

Practical Insights

  • Prioritize CNN for image tasks; the performance advantage is significant.
  • Start with simple models to establish a baseline, then gradually introduce complex architectures.
  • High-quality data preprocessing is the foundation of success.
  • Visualization is an important tool for model diagnosis and tuning.

Future Directions

  • Introduce data augmentation to improve generalization ability.
  • Try deep architectures like ResNet and DenseNet.
  • Explore transfer learning applications.
  • Deploy as a web service.