Reading

From Perceptron to Convolutional Neural Network: The Evolution of MNIST Handwritten Digit Recognition

This article deeply compares the performance of three deep learning models on the MNIST handwritten digit recognition task—from single-layer perceptron to multi-layer neural network and then to convolutional neural network—showing how model complexity affects image classification performance.

深度学习卷积神经网络MNIST图像识别感知机神经网络TensorFlowKeras机器学习

Published 2026-06-09 12:45Recent activity 2026-06-09 12:49Estimated read 7 min

From Perceptron to Convolutional Neural Network: The Evolution of MNIST Handwritten Digit Recognition

Section 01

From Perceptron to CNN: The Evolution of MNIST Handwritten Digit Recognition (Introduction)

Title: From Perceptron to Convolutional Neural Network: The Evolution of MNIST Handwritten Digit Recognition

Original Author/Maintainer: GitForTiger Source Platform: GitHub Original Project Title: MNIST-classification-perceptron-vs-ANN-vs-CNN Original Link: https://github.com/GitForTiger/MNIST-classification-perceptron-vs-ANN-vs-CNN Publication Time: June 9, 2026

This article deeply compares the performance of three deep learning models—single-layer perceptron, multi-layer artificial neural network (ANN), and convolutional neural network (CNN)—on the MNIST handwritten digit recognition task, demonstrating how model complexity and architectural innovation drive improvements in image classification performance.

Section 02

Background: MNIST Dataset and Handwritten Digit Recognition Challenges

The MNIST dataset is one of the most famous benchmark datasets in the machine learning field, containing 70,000 28×28 pixel grayscale images of handwritten digits (10 classes from 0 to 9). Since its release in 1998, it has been the gold standard for testing new algorithms and model architectures. Handwritten digit recognition is a classic challenge; variations in stroke positions and shapes from different writers pose tests for models.

Section 03

Data Preprocessing: Preparing for Model Training

Key steps in data preprocessing:

Normalization: Convert pixel values from the range 0-255 to 0-1 to improve training stability and convergence speed.
Data Reshaping: Perceptron/ANN require flattening into a 28×28 vector, while CNN retain the 28×28×1 3D tensor to preserve spatial structure.
Label One-Hot Encoding: Convert digital categories into classification vectors to facilitate training with cross-entropy loss function.

Section 04

Model Architecture and Performance

Single-Layer Perceptron

Architecture: Flatten layer + 10-neuron fully connected layer (Softmax activation). Trained with SGD optimizer and categorical cross-entropy; test accuracy is 90.97%. Key observation: Linear classifier, difficult to capture non-linear relationships and spatial structures.

Multi-Layer Neural Network

Architecture: Flatten → 128-neuron ReLU hidden layer → 64-neuron layer → 10-neuron Softmax output. Trained with Adam optimizer; accuracy is 97.78%. Key observation: Non-linear activation and hidden layers enable hierarchical feature learning.

Convolutional Neural Network

Architecture: Conv2D(32) → MaxPool → Conv2D(64) → MaxPool → Flatten → 128-neuron layer (Dropout 0.5) → 10-neuron output. Accuracy is 99.29%. Key observation: Convolution operations automatically extract spatial features; weight sharing and local connectivity enhance translation invariance.

Section 05

Performance Comparison and In-depth Analysis

Comparison of key metrics for the three models:

Metric	Perceptron	ANN	CNN
Test Accuracy	90.97%	97.78%	99.29%
Learning Type	Linear	Non-linear	Spatial Feature Learning
Complexity	Low	Medium	High
Feature Extraction	Manual/None	Learned	Auto-extracted
Performance on Image Data	Medium	Excellent	Outstanding

The accuracy increased by 8.32 percentage points, and the error rate dropped from 9.03% to 0.71% (a reduction of about 13 times). Parameter counts: Perceptron ~7850, ANN ~110,000, CNN has reasonable parameter counts due to weight sharing.

Section 06

Visualization Insights: Understanding the Model Learning Process

Visualization tools help understand models:

Training Curves: Observe the synchronization of training/validation accuracy to judge overfitting.
Loss Curves: Monitor training/validation loss trends to identify overfitting signals.
Confusion Matrix: Show pairs of digits that are easily misclassified (e.g., 3&8, 4&9).
Sample Prediction Comparison: Intuitively感受 the decision differences between models and understand CNN's detail capture ability.

Section 07

Practical Insights and Future Directions

Practical Insights

Prioritize CNN for image tasks; the performance advantage is significant.
Start with simple models to establish a baseline, then gradually introduce complex architectures.
High-quality data preprocessing is the foundation of success.
Visualization is an important tool for model diagnosis and tuning.

Future Directions

Introduce data augmentation to improve generalization ability.
Try deep architectures like ResNet and DenseNet.
Explore transfer learning applications.
Deploy as a web service.