Zing Forum

Reading

Building a Neural Network from Scratch: MNIST Handwritten Digit Recognition with Pure NumPy

This article explains how to build a deep neural network from scratch using pure NumPy to implement MNIST handwritten digit recognition, and deeply understand core mechanisms such as backpropagation and gradient descent.

神经网络NumPyMNIST手写数字识别反向传播深度学习机器学习从零实现
Published 2026-05-30 02:43Recent activity 2026-05-30 02:49Estimated read 10 min
Building a Neural Network from Scratch: MNIST Handwritten Digit Recognition with Pure NumPy
1

Section 01

Building a Neural Network from Scratch: MNIST Handwritten Digit Recognition with Pure NumPy

Building a Neural Network from Scratch: MNIST Handwritten Digit Recognition with Pure NumPy

Core Points

This article explains how to build a deep neural network from scratch using pure NumPy to implement MNIST handwritten digit recognition, and deeply understand core mechanisms such as backpropagation and gradient descent.

Project Source

2

Section 02

Project Background and Introduction to the MNIST Dataset

Project Background and Introduction to the MNIST Dataset

Project Background

Deep learning frameworks like TensorFlow and PyTorch simplify neural network construction but hide underlying details. For learners who want to truly understand how neural networks work, implementing from scratch is an irreplaceable learning experience. Based on this idea, this project uses pure NumPy to build a deep neural network to solve the MNIST handwritten digit recognition problem.

MNIST Dataset

MNIST is a well-known benchmark dataset in the machine learning field, containing 70,000 28×28 pixel grayscale images of handwritten digits, divided into 10 categories (0-9). Among them, 60,000 are used for training and 10,000 for testing. Its moderate scale and reasonable difficulty make it an ideal choice for verifying new algorithms.

3

Section 03

Neural Network Architecture Design

Neural Network Architecture Design

The project implements a multi-layer feedforward neural network with the following core components:

Input Layer

Receives flattened image data. Since MNIST images are 28×28 pixels, the input layer has 784 neurons (28×28=784), each corresponding to a pixel value.

Hidden Layer

Contains one or more hidden layers, using the ReLU activation function (f(x)=max(0,x)), which effectively alleviates the gradient vanishing problem.

Output Layer

Has 10 neurons corresponding to 10 digit categories, using the Softmax activation function to convert to a probability distribution for easy category prediction.

4

Section 04

Analysis of Core Algorithm Mechanisms

Analysis of Core Algorithm Mechanisms

Forward Propagation

Data is passed layer by layer from the input layer to the output layer. The calculation formula is: z = W·a + b a_next = activation(z) Where W is the weight matrix, b is the bias vector, and a is the activation value of the previous layer.

Loss Function

Cross-entropy loss is used to measure the gap between predictions and real labels. For multi-classification, the definition is: L = -Σ(y_i · log(ŷ_i)) Where y_i is the one-hot encoding of the real label, and ŷ_i is the Softmax output probability.

Backpropagation

Uses the chain rule to efficiently calculate the gradient of the loss with respect to the parameters of each layer, propagating the error signal backward from the output layer to provide a basis for parameter adjustment.

Gradient Descent Optimization

Update parameters: W = W - α · ∂L/∂W b = b - α · ∂L/∂b Where α is the learning rate, which controls the update step size.

5

Section 05

Advantages of Pure NumPy Implementation

Advantages of Pure NumPy Implementation

  1. Efficient Matrix Operations: Uses optimized C and Fortran code under the hood, maintaining simplicity while achieving good performance.
  2. Full Control Over Details: From matrix multiplication dimension matching to activation function selection, every decision is clearly visible, helping to understand mathematical foundations.
  3. Lightweight and Dependency-Free: Does not rely on large deep learning frameworks; the code is easy to understand and modify, suitable for teaching and learning.
6

Section 06

Training Process and Tuning Tips

Training Process and Tuning Tips

  • Hyperparameter Selection: Need to adjust learning rate, hidden layer size, number of training epochs, etc. The author explores the performance of different configurations through experiments.
  • Monitoring Metrics: Training loss should gradually decrease, and validation accuracy should gradually increase; if training loss decreases but validation accuracy stagnates, overfitting may occur.
7

Section 07

Practical Significance and Application Scenarios

Practical Significance and Application Scenarios

Mastering the concepts and skills of this project can be transferred to complex tasks, especially suitable for:

  • Algorithm Research and Innovation: Only by deeply understanding basic principles can you propose valuable improvement plans.
  • Model Debugging and Optimization: Underlying knowledge helps quickly locate abnormal behaviors in advanced frameworks.
  • Resource-Constrained Environments: Lightweight custom implementations are more suitable for embedded/edge computing scenarios than general-purpose frameworks.
  • Teaching and Knowledge Dissemination: A clear underlying implementation is the best teaching material for teaching neural network principles.
8

Section 08

Learning Suggestions and Conclusion

Learning Suggestions and Conclusion

Learning Suggestions

  1. Foundation Preparation: Master basic concepts of linear algebra (matrix operations) and calculus (partial derivatives).
  2. Manual Deduction: First manually calculate forward/backward propagation of small networks to build an intuitive understanding.
  3. Function Expansion: Try different activation functions (Tanh, Sigmoid), regularization techniques (L2, Dropout), and optimizers (Adam, RMSprop).
  4. Transfer Application: Apply what you have learned to other datasets such as CIFAR-10 image classification or IMDb sentiment analysis.

Conclusion

Building a neural network from scratch is a challenging yet rewarding learning project. In today's era of deep learning tooling, understanding underlying mechanisms is not only an academic pursuit but also a necessary path to becoming an excellent machine learning engineer.