Zing Forum

Reading

Building an MNIST Handwritten Digit Recognition Model from Scratch: A Practical Comparison of Multilayer Perceptrons and Optimization Algorithms

This article introduces a handwritten digit recognition project based on the MNIST dataset, implemented using a Multilayer Perceptron (MLP) neural network, and compares the performance of different optimization algorithms.

MNIST手写数字识别多层感知机MLP优化算法机器学习神经网络SGDAdam
Published 2026-05-25 07:13Recent activity 2026-05-25 07:23Estimated read 7 min
Building an MNIST Handwritten Digit Recognition Model from Scratch: A Practical Comparison of Multilayer Perceptrons and Optimization Algorithms
1

Section 01

[Introduction] Building an MNIST Handwritten Digit Recognition Model from Scratch: A Practical Comparison of Multilayer Perceptrons and Optimization Algorithms

This project implements handwritten digit recognition based on the MNIST dataset, using a Multilayer Perceptron (MLP) neural network as the core, and systematically compares the performance of optimization algorithms such as SGD, Adam, RMSprop, and Adagrad. The project aims to help developers understand the practical impact of algorithm selection on model training, covering the complete machine learning workflow from data preprocessing, model construction, training optimization to performance evaluation.

Original Author: Hibabg21 | Source Platform: GitHub | Original Link: https://github.com/Hibabg21/projet_mnist_benguara | Publication Date: 2026-05-24

2

Section 02

Project Background

Handwritten digit recognition is a classic machine learning problem, and the MNIST dataset, known as the "Hello World" of this field, is widely used in teaching and algorithm validation. This project not only implements basic recognition functions but also helps developers intuitively understand the impact of algorithm selection on model performance by comparing different optimization algorithms.

3

Section 03

Technical Architecture and Core Methods

The core architecture of the project is a Multilayer Perceptron (MLP): the input layer receives flattened data of 28×28 pixels (784 dimensions), the hidden layers extract features, and the output layer generates classification probabilities for digits 0-9.

The optimization algorithms compared include:

  • SGD: A basic method that updates weights by estimating gradients from mini-batches of data
  • Adam: Combines momentum and RMSprop to adaptively adjust learning rates
  • RMSprop: Normalizes gradients using exponential moving averages, suitable for non-stationary targets
  • Adagrad: Assigns different learning rates to each parameter, suitable for sparse data
4

Section 04

Implementation Details

Data Preprocessing:

  1. Normalization: Scale pixel values from 0-255 to 0-1
  2. Flattening: Convert 28×28 2D images into 1D vectors
  3. Label Encoding: Convert digital labels to one-hot encoding

Network Structure:

  • Input layer: 784 neurons
  • Hidden layer 1: 128 neurons (ReLU activation)
  • Hidden layer 2: 64 neurons (ReLU activation)
  • Output layer: 10 neurons (Softmax activation)

Loss and Evaluation: Cross-entropy loss function is used; evaluation metrics include accuracy, loss value, and convergence speed.

5

Section 05

Performance Comparison of Optimization Algorithms

SGD: Advantages are efficient computation and low memory usage; disadvantages include easy trapping in local optima, slow convergence, and sensitivity to learning rate.

Adam: Its strengths lie in adaptive learning rate adjustment, fast convergence, insensitivity to initial learning rate, and excellent performance on small to medium datasets like MNIST.

Other Algorithms:

  • RMSprop: Suitable for recurrent neural networks or non-stationary target problems
  • Adagrad: Suitable for datasets with sparse features (e.g., text classification)
6

Section 06

Practical Significance and Application Insights

Algorithm selection directly affects training time, final accuracy, and resource consumption. In practical projects, choosing the right algorithm often improves performance more than adjusting the network structure.

The MNIST workflow can be migrated to complex tasks:

  • Complex handwritten character recognition (EMNIST, SVHN)
  • Document scanning and digitization
  • Automatic bank check recognition
  • Automatic postal code sorting
7

Section 07

Summary and Outlook

This project provides a complete practical case for machine learning beginners, demonstrating the full workflow from data preparation to model evaluation, and emphasizing the importance of algorithm selection (which needs to be balanced based on task and data characteristics).

Future exploration directions:

  • Introduce Convolutional Neural Networks (CNN) to improve accuracy
  • Implement data augmentation to enhance generalization ability
  • Deploy the model as a web service for practical use