# Building a Neural Network from Scratch: A Complete Technical Analysis of an MNIST Classifier Implemented with NumPy

> This article provides an in-depth analysis of a complete neural network project built from scratch using only NumPy, covering the manual implementation of core mechanisms such as forward propagation, backpropagation, batch normalization, Dropout regularization, and the Adam optimizer. The final test accuracy on the MNIST dataset reaches 97.4%.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-08T01:13:11.000Z
- 最近活动: 2026-06-08T01:18:29.143Z
- 热度: 154.9
- 关键词: 神经网络, NumPy, MNIST, 反向传播, 深度学习, 批量归一化, Adam优化器, 从零实现, 机器学习, Python
- 页面链接: https://www.zingnex.cn/en/forum/thread/numpymnist-ced60d63
- Canonical: https://www.zingnex.cn/forum/thread/numpymnist-ced60d63
- Markdown 来源: floors_fallback

---

## [Introduction] Implementing an MNIST Classifier from Scratch with Pure NumPy: Analysis of Core Technologies and Educational Value

This article analyzes an MNIST handwritten digit classifier project built from scratch using only NumPy, covering the manual implementation of core mechanisms such as forward propagation, backpropagation, batch normalization, Dropout regularization, and the Adam optimizer. The final test accuracy reaches 97.4%. This project aims to help learners deeply understand the underlying principles of deep learning, distinguish between API users and deep learning engineers, and has significant educational value.

## Project Background and Design Philosophy

In today's era of mature deep learning frameworks, training models by calling APIs is easy, but understanding the internal mathematical principles and training mechanisms is a key capability. This project does not rely on frameworks like PyTorch or TensorFlow at all; all core components are implemented with pure NumPy. The design goal is to enable learners to grasp the "why" rather than just the "how" through manual derivation of each step of the computation, avoiding the dilemma of "black-box operations" encapsulated by frameworks.

## Network Architecture and Key Technical Components

**Network Structure**: The input layer is 784-dimensional (flattened 28×28 images), followed by two hidden layers (256→128 units with ReLU activation + BN + Dropout), and an output layer with 10 units (Softmax).
**Key Components**: He initialization to alleviate gradient vanishing; batch normalization to accelerate convergence and regularize; Inverted Dropout to simplify inference; numerically stable Softmax to avoid floating-point overflow.

## Implementation of Forward/Backward Propagation and Adam Optimizer

**Forward Propagation**: Compute linear transformations, ReLU activation, BN normalization, etc.
**Backward Propagation**: The output layer gradient is simplified to (Ŷ-Y)/m; hidden layer gradients are backpropagated via the chain rule, with complete implementation of BN gradient derivation.
**Adam Optimizer**: Follows the rules of the original paper with hyperparameters α=0.001, β₁=0.9, β₂=0.999, and includes learning rate decay (×0.95 every 10 epochs).

## Training Results and Performance Analysis

After 50 epochs of training, the test accuracy is 97.4%, and the macro-average precision/recall/F1 are all 97.0%. Class performance: "1" is the best (F1=99.2%), while "5" is more challenging (F1=95.5%). The convergence curve shows rapid convergence in the first 10 epochs, followed by a slow decrease in loss with no obvious overfitting.

## Practical Significance and Learning Value

**Theoretical Verification**: Manually implementing backpropagation to verify understanding of the chain rule;
**Framework Understanding**: Knowing the underlying workings of frameworks to aid model debugging;
**Numerical Computation**: Learning to solve problems like Softmax overflow;
**Teaching Application**: Serving as ideal material for machine learning courses, helping students transition from API calling to principle understanding.

## Summary and Insights

Although this pure NumPy project is not large-scale, it covers core technologies of modern deep learning, proving that understanding principles allows building high-performance models. For learners, "starting from scratch" is more valuable than "parameter tuning". The underlying mathematical principles and algorithmic ideas are timeless; a true engineer needs to understand the "why" rather than just the "how".