Zing Forum

Reading

From Scratch Implementation of a Feedforward Neural Network with NumPy: Best Practices for Understanding Core Deep Learning Mechanisms

This article provides an in-depth analysis of a feedforward neural network project implemented purely with NumPy, covering He initialization, custom backpropagation, comparison of multiple optimizers (SGD, Momentum, Adam, AdamW), and model serialization. It is an excellent learning resource for understanding the underlying principles of deep learning.

NumPy神经网络反向传播He初始化优化器SGDAdamAdamW深度学习机器学习
Published 2026-06-16 21:14Recent activity 2026-06-16 21:19Estimated read 6 min
From Scratch Implementation of a Feedforward Neural Network with NumPy: Best Practices for Understanding Core Deep Learning Mechanisms
1

Section 01

Introduction: Pure NumPy Implementation of a Feedforward Neural Network—A Practical Guide to Deep Learning Underlying Mechanisms

This article introduces Dawood-Amir's numpy-ffn-from-scratch project on GitHub, which implements a feedforward neural network purely with NumPy for Iris dataset classification. The project covers He initialization, custom backpropagation, comparison of multiple optimizers (SGD, Momentum, Adam, AdamW), and model serialization, making it a high-quality resource for understanding the underlying principles of deep learning.

2

Section 02

Project Background: Why Implement a Neural Network From Scratch?

In today's era of mature frameworks like PyTorch and TensorFlow, the significance of implementing from scratch lies in understanding the underlying mechanisms. Writing backpropagation by hand, implementing He initialization, and comparing optimizer performances can make the "black box" concepts encapsulated by frameworks transparent. This project targets Iris dataset classification, with rich code comments and a clear structure, making it easy to follow the data flow and understand each mathematical operation step by step.

3

Section 03

Network Architecture and He Initialization Strategy

The project uses a three-layer network structure: Input layer (4 neurons, corresponding to the 4 features of Iris) → First hidden layer (16 neurons, ReLU activation) → Second hidden layer (4 neurons, ReLU activation) → Output layer (3 neurons, Softmax activation). Weight initialization uses the He strategy, with the formula W = np.random.randn(input_size, output_size) * np.sqrt(2.0 / input_size), which adapts to ReLU characteristics and avoids signal attenuation or explosion.

4

Section 04

Custom Backpropagation: Core Analysis of Gradient Flow

The key steps of handwritten backpropagation in the project include: 1. Gradient calculation of Softmax cross-entropy (comparing output probabilities with true labels); 2. Gradient backpropagation in hidden layers (transmitted via weight matrix transposition); 3. ReLU gradient correction (derivative is 1 in positive regions, 0 in negative regions); 4. Parameter update (updating weights and biases based on learning rate and gradients). This implementation turns the abstract concept of "gradient descent" into concrete matrix operations.

5

Section 05

Optimizer Comparison Experiment: Practical Battle of SGD, Momentum, Adam, and AdamW

The project has a built-in optimizer comparison experiment where four optimizers compete under the same dataset and architecture: Vanilla SGD (baseline), SGD with Momentum (momentum term), Adam (adaptive learning rate), and AdamW (decoupled weight decay). The experiment design is rigorous: each optimizer uses a newly initialized model, trains for 150 epochs, records accuracy changes, and automatically saves the best model as best_iris_model.pkl.

6

Section 06

Data Pipeline and Inference Deployment Practice

Data processing flow: Load the Iris dataset using scikit-learn, perform Z-score normalization (mean 0, standard deviation 1), and split into training and test sets in an 80/20 ratio. Inference deployment: Load the best_iris_model.pkl model, receive new samples to output predicted classes, and map them to readable names (e.g., "Iris-setosa") to implement an end-to-end application.

7

Section 07

Learning Value and Summary Reflections

Value for beginners: Transparency (no framework encapsulation), completeness (covers the entire ML workflow), experimental nature (cultivates scientific thinking), and scalability (modular design). For experienced developers: A tool for rapid prototype verification. Summary: Underlying understanding is the cornerstone of building complex systems. This project is worth studying and experimenting with, reminding us that excellent engineers need to understand the principles behind the tools.