Zing Forum

Reading

Pure NumPy Implementation of MNIST Handwritten Digit Recognition: Building a Neural Network from Scratch

A handwritten digit classifier implemented from scratch using only NumPy, without relying on any deep learning frameworks. It achieves approximately 90% accuracy on the MNIST dataset through manual implementation of forward propagation, backpropagation, and gradient descent.

神经网络NumPyMNIST手写数字识别反向传播梯度下降深度学习教学从零实现
Published 2026-06-14 15:46Recent activity 2026-06-14 15:53Estimated read 8 min
Pure NumPy Implementation of MNIST Handwritten Digit Recognition: Building a Neural Network from Scratch
1

Section 01

Main Floor | Pure NumPy Implementation of MNIST Neural Network: A Learning Guide from Scratch

This project focuses on not relying on any deep learning frameworks and implements an MNIST handwritten digit recognition neural network from scratch using only NumPy. Its core goal is to help learners understand the underlying principles of neural networks (forward propagation, backpropagation, gradient descent, etc.), achieving approximately 90% test accuracy on the MNIST dataset.

Project Information

The project refuses to use advanced libraries like TensorFlow and PyTorch, and demonstrates the working mechanism of neural networks using a "first principles" approach, making it an excellent practical resource for deep learning beginners.

2

Section 02

Background and Core Features

Project Background

Today, as deep learning frameworks simplify development, fewer developers understand the internal operations of neural networks. This project aims to fill this gap by implementing with pure NumPy, allowing learners to master the mathematical principles and code implementation of each component.

Core Technology Stack

✅ Used: Linear algebra (matrix operations), calculus (chain rule), forward/backward propagation, gradient descent, ReLU/Softmax activation, One-Hot Encoding, NumPy matrix operations. ❌ Rejected: Any advanced deep learning libraries or pre-built APIs such as TensorFlow, PyTorch, Scikit-Learn.

This "subtraction" design makes the project an ideal textbook for learning the principles of neural networks.

3

Section 03

Detailed Technical Architecture

Network Structure (Inference)

Based on the MNIST task and technology stack, the project implements a standard Multi-Layer Perceptron (MLP): Input layer (784 neurons, 28×28 pixels flattened) → Hidden layer (with ReLU activation) → Output layer (10 neurons, Softmax activation) → Probability distribution prediction.

Key Implementations

  1. Forward Propagation: Linear transformation (Z=W·X +b) + Non-linear activation (ReLU/Softmax).
  2. Backpropagation: Calculate gradients of loss with respect to parameters using the chain rule, propagating from the output layer to the input layer.
  3. Gradient Descent: Update weights (W_new=W_old - lr·∂L/∂W) and biases in the direction opposite to the gradient.
  4. Activation Functions: ReLU (alleviates gradient vanishing), Softmax (multi-class probability output).
  5. One-Hot Encoding: Convert digital labels to vector form (e.g., label 3 → [0,0,0,1,...]) to facilitate cross-entropy loss calculation.
4

Section 04

Dataset and Experimental Results

MNIST Dataset Details

MNIST is a classic handwritten digit dataset:

  • Training set: 60,000 images, Test set: 10,000 images
  • Image size: 28×28 grayscale pixels, Categories: 10 (0-9)
  • Pixel value range: 0-255 (normalized to 0-1)

Experimental Results

The test accuracy is approximately 90%. Considering no optimization techniques like batch normalization, Dropout, or convolutional layers (CNNs usually reach 99%+), this result is quite impressive for a pure MLP architecture, proving the effectiveness of basic neural networks.

5

Section 05

Learning Value of the Project

Why Is It Worth Learning?

  1. Principle First: Without relying on framework APIs, deeply understand underlying logic such as matrix operations, chain rule, and gradient updates.
  2. Math-Code Correspondence: Translate abstract formulas into NumPy code (e.g., matrix multiplication → np.dot, ReLU → np.maximum).
  3. Improve Debugging Skills: Manually check dimension matching, verify gradient correctness, and solve numerical stability issues.
  4. Foundation for Framework Source Code: After understanding this project, it becomes easier to read source code of autograd or optimizers in PyTorch/TensorFlow.
6

Section 06

Learning Path and Extension Directions

Prerequisites

Python basics, NumPy operations, linear algebra (matrix operations), calculus (partial derivatives/chain rule).

Learning Sequence

  1. Load and visualize MNIST data → 2. Initialize parameters →3. Implement forward propagation →4. Calculate loss →5. Implement backpropagation →6. Gradient descent update →7. Training loop →8. Test evaluation.

Extension Directions

  • Increase the number of hidden layers → Try other activation functions (Sigmoid/Tanh) → Add regularization (L2/Dropout) → Implement optimizers like Adam → Build CNN → Apply to Fashion-MNIST/CIFAR-10 datasets.
7

Section 07

Summary and Target Audience

Summary

This project demonstrates the core principles of neural networks with minimal dependencies in a "back-to-basics" way. The 90% accuracy proves the effectiveness of the basic implementation, while emphasizing that understanding principles is more important than calling APIs.

Target Audience

  • Deep learning beginners (to understand underlying principles) → Algorithm interview candidates (to prepare for ML technical interviews) → Educators (looking for cases) → Framework source code enthusiasts → Learners verifying knowledge.

This is a rare practical resource that helps build a solid foundation in deep learning.