Zing Forum

Reading

Back to Basics: Implementing MNIST Handwritten Digit Recognition with a Pure NumPy Neural Network

An educational project that implements a feedforward neural network from scratch without relying on any deep learning frameworks, using only NumPy, to reveal the mathematical essence of neural networks through the MNIST handwritten digit classification task.

神经网络MNISTNumPy反向传播深度学习前馈网络手写数字识别梯度下降机器学习基础从零实现
Published 2026-05-11 10:59Recent activity 2026-05-11 11:05Estimated read 6 min
Back to Basics: Implementing MNIST Handwritten Digit Recognition with a Pure NumPy Neural Network
1

Section 01

[Introduction] The Core Significance of Implementing MNIST Recognition with a Pure NumPy Handwritten Neural Network

In today's era where PyTorch and TensorFlow are widely used, this project implements a feedforward neural network using pure NumPy to complete MNIST handwritten digit recognition. Its aim is to strip away the details encapsulated by frameworks, helping developers understand the mathematical essence of neural networks (such as backpropagation, gradient descent, etc.). This is a "framework-free" practical exercise for learning deep learning fundamentals.

2

Section 02

Background: Why MNIST Remains an Ideal Educational Dataset

The MNIST dataset contains 60,000 training images and 10,000 test images, each being a 28×28 grayscale image (corresponding to digits 0-9). Its value as an educational tool lies in: moderate scale (can be quickly trained on ordinary laptops), intuitive task (handwritten digit recognition is easy to understand), and ability to demonstrate real training processes (batch processing, convergence curves, overfitting, etc.).

3

Section 03

Method: Mathematical Structure of Feedforward Neural Networks

Each layer operation of the feedforward network (MLP) implemented in the project:

  1. Linear transformation: z = Wx + b (W is the weight matrix, x is the input vector, b is the bias)
  2. Nonlinear activation: a = σ(z) (without activation, multiple layers are equivalent to a single layer; ReLU is a commonly used hidden layer activation function: f(x)=max(0,x))
  3. Output layer: The Softmax function converts the output into a probability distribution (ensuring non-negativity and sum to 1).
4

Section 04

Method: Core Principles of Backpropagation

The core of training is backpropagation, based on the chain rule:

  • Forward propagation: Input is passed layer by layer to get predictions, and cross-entropy loss is calculated (to measure the distance between predictions and true labels)
  • Backpropagation: Starting from the loss, gradients of parameters are calculated layer by layer (the chain rule multiplies local gradients to get the total gradient) Handwritten implementation requires deriving gradient formulas by hand, building intuition about the source and role of gradients.
5

Section 05

Implementation Details: Key Considerations for NumPy Implementation

  1. Weight initialization: Avoid zero initialization (symmetry trap); commonly used Xavier/He initialization
  2. Batch processing: Use matrix operations to support mini-batch samples, balancing efficiency and stability
  3. Learning rate scheduling: Experiment with strategies like linear/exponential decay to optimize convergence
  4. Numerical stability: Softmax needs to subtract the maximum input value to prevent overflow (softmax(x)=exp(x-max(x))/sum(exp(x-max(x))))
6

Section 06

Evidence: Intuitive Understanding from Experiments

With handwritten implementation, you can experiment freely:

  • Adjust the number of neurons/layers in hidden layers to observe the relationship between model capacity and overfitting
  • Replace activation functions (e.g., Sigmoid→ReLU) to feel changes in training speed
  • Adjust the learning rate to observe loss oscillation The final model accuracy can reach over 97%, bringing a sense of deep engagement that framework calls cannot provide.
7

Section 07

Suggestions: Learning Directions to Expand from MNIST

After mastering handwritten MLP, you can advance to:

  1. Architecture expansion: Implement convolutional layers (extract spatial features), recurrent layers (process sequences)
  2. Optimizer advancement: Implement modern optimizers like Momentum, Adam
  3. Regularization: Add Dropout, L2 regularization, Batch Normalization
  4. Return to frameworks: When using PyTorch/TensorFlow again, you can understand the mathematical meaning behind the APIs.
8

Section 08

Conclusion: Implementing from Scratch is a Must for Becoming an AI Engineer

"Implementing from scratch" may go against the trend, but it's like understanding the principle of an engine to learn driving well: This project condenses the core ideas of deep learning (forward propagation, loss calculation, backpropagation, parameter update) within 500 lines, which is a key step from being an "API caller" to a real AI engineer.