# From Perceptron to Convolutional Neural Network: The Evolution of MNIST Handwritten Digit Recognition

> This article deeply compares the performance of three deep learning models on the MNIST handwritten digit recognition task—from single-layer perceptron to multi-layer neural network and then to convolutional neural network—showing how model complexity affects image classification performance.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-09T04:45:07.000Z
- 最近活动: 2026-06-09T04:49:22.694Z
- 热度: 152.9
- 关键词: 深度学习, 卷积神经网络, MNIST, 图像识别, 感知机, 神经网络, TensorFlow, Keras, 机器学习
- 页面链接: https://www.zingnex.cn/en/forum/thread/mnist-47487f8b
- Canonical: https://www.zingnex.cn/forum/thread/mnist-47487f8b
- Markdown 来源: floors_fallback

---

## From Perceptron to CNN: The Evolution of MNIST Handwritten Digit Recognition (Introduction)

Title: From Perceptron to Convolutional Neural Network: The Evolution of MNIST Handwritten Digit Recognition

Original Author/Maintainer: GitForTiger
Source Platform: GitHub
Original Project Title: MNIST-classification-perceptron-vs-ANN-vs-CNN
Original Link: https://github.com/GitForTiger/MNIST-classification-perceptron-vs-ANN-vs-CNN
Publication Time: June 9, 2026

This article deeply compares the performance of three deep learning models—single-layer perceptron, multi-layer artificial neural network (ANN), and convolutional neural network (CNN)—on the MNIST handwritten digit recognition task, demonstrating how model complexity and architectural innovation drive improvements in image classification performance.

## Background: MNIST Dataset and Handwritten Digit Recognition Challenges

The MNIST dataset is one of the most famous benchmark datasets in the machine learning field, containing 70,000 28×28 pixel grayscale images of handwritten digits (10 classes from 0 to 9). Since its release in 1998, it has been the gold standard for testing new algorithms and model architectures. Handwritten digit recognition is a classic challenge; variations in stroke positions and shapes from different writers pose tests for models.

## Data Preprocessing: Preparing for Model Training

Key steps in data preprocessing:
1. Normalization: Convert pixel values from the range 0-255 to 0-1 to improve training stability and convergence speed.
2. Data Reshaping: Perceptron/ANN require flattening into a 28×28 vector, while CNN retain the 28×28×1 3D tensor to preserve spatial structure.
3. Label One-Hot Encoding: Convert digital categories into classification vectors to facilitate training with cross-entropy loss function.

## Model Architecture and Performance

### Single-Layer Perceptron
Architecture: Flatten layer + 10-neuron fully connected layer (Softmax activation). Trained with SGD optimizer and categorical cross-entropy; test accuracy is 90.97%. Key observation: Linear classifier, difficult to capture non-linear relationships and spatial structures.

### Multi-Layer Neural Network
Architecture: Flatten → 128-neuron ReLU hidden layer → 64-neuron layer → 10-neuron Softmax output. Trained with Adam optimizer; accuracy is 97.78%. Key observation: Non-linear activation and hidden layers enable hierarchical feature learning.

### Convolutional Neural Network
Architecture: Conv2D(32) → MaxPool → Conv2D(64) → MaxPool → Flatten → 128-neuron layer (Dropout 0.5) → 10-neuron output. Accuracy is 99.29%. Key observation: Convolution operations automatically extract spatial features; weight sharing and local connectivity enhance translation invariance.

## Performance Comparison and In-depth Analysis

Comparison of key metrics for the three models:
| Metric | Perceptron | ANN | CNN |
|------|--------|-----|-----|
| Test Accuracy | 90.97% | 97.78% | 99.29% |
| Learning Type | Linear | Non-linear | Spatial Feature Learning |
| Complexity | Low | Medium | High |
| Feature Extraction | Manual/None | Learned | Auto-extracted |
| Performance on Image Data | Medium | Excellent | Outstanding |

The accuracy increased by 8.32 percentage points, and the error rate dropped from 9.03% to 0.71% (a reduction of about 13 times). Parameter counts: Perceptron ~7850, ANN ~110,000, CNN has reasonable parameter counts due to weight sharing.

## Visualization Insights: Understanding the Model Learning Process

Visualization tools help understand models:
1. Training Curves: Observe the synchronization of training/validation accuracy to judge overfitting.
2. Loss Curves: Monitor training/validation loss trends to identify overfitting signals.
3. Confusion Matrix: Show pairs of digits that are easily misclassified (e.g., 3&8, 4&9).
4. Sample Prediction Comparison: Intuitively感受 the decision differences between models and understand CNN's detail capture ability.

## Practical Insights and Future Directions

### Practical Insights
- Prioritize CNN for image tasks; the performance advantage is significant.
- Start with simple models to establish a baseline, then gradually introduce complex architectures.
- High-quality data preprocessing is the foundation of success.
- Visualization is an important tool for model diagnosis and tuning.

### Future Directions
- Introduce data augmentation to improve generalization ability.
- Try deep architectures like ResNet and DenseNet.
- Explore transfer learning applications.
- Deploy as a web service.