Reading

Building Neural Networks from Scratch: Implementing Core Deep Learning Principles with Pure Python

A deep learning framework fully implemented with native Python, without relying on PyTorch, TensorFlow, or NumPy. By hand-writing matrix operations, backpropagation, and optimization algorithms, it helps users deeply understand the working mechanism of neural networks.

神经网络深度学习Python反向传播机器学习从零实现教育

Published 2026-06-10 08:42Recent activity 2026-06-10 08:49Estimated read 8 min

Section 01

Building Neural Networks from Scratch: Implementing Core Deep Learning Principles with Pure Python (Introduction)

This project implements a deep learning framework using pure native Python, without relying on PyTorch, TensorFlow, or NumPy. It covers core components such as matrix operations, backpropagation, and optimization algorithms, and includes practical demos like a square detector to help learners deeply understand the working mechanism of neural networks.

Section 02

Project Background and Motivation

The high-level APIs of modern deep learning frameworks (e.g., PyTorch, TensorFlow) encapsulate complex mathematical operations, leading many developers to use neural networks without understanding their internal principles, which limits their deep comprehension and debugging capabilities. Tzur Soffer created this project to implement a neural network framework with pure native Python, no third-party library dependencies, allowing learners to master each component line by line.

Section 03

Core Components of Neural Network Basic Architecture

Matrix Operation Layer

Without using NumPy, manually implement matrix multiplication, addition, transposition, and batch processing logic to deeply understand the underlying execution of tensor operations.

Fully Connected Layer

Implement initialization of weight matrices and bias vectors, forward propagation calculation, and backpropagation parameter updates. The core formula is output = activation(weights · inputs + bias).

Activation Functions

Implement activation functions such as Pass (linear), ReLU, LeakyReLU, and Softmax, including forward propagation and derivative calculation for backpropagation, laying the foundation for understanding backpropagation.

Section 04

Detailed Explanation of Training Mechanism

Forward Propagation

Data flows from the input layer to the output layer layer by layer. Each layer performs matrix operations and activation functions. For classification tasks, the output layer often uses Softmax to convert to a probability distribution.

Loss Function

Cross-entropy loss is used to measure the difference between the predicted probability distribution and the true labels. When the prediction is wrong and the confidence is high, the loss value is large, providing a strong correction signal.

Backpropagation

Use the chain rule to calculate the gradient of the loss with respect to each parameter, including gradient calculation for weights, biases, and inputs. When Softmax is combined with cross-entropy, the derivative simplifies to ∂L/∂zi = (pi − ti) / N.

Gradient Descent

Parameter updates follow Wnew = Wold − η × ∇WL. The learning rate must be chosen appropriately (too large leads to oscillation and divergence, too small leads to slow training). Batch learning is supported to stabilize gradient estimation.

Section 05

Practical Application: Square Detector

Hilbert Curve Mapping

Use a space-filling curve to map 2D images to 1D vectors, preserving pixel locality, improving feature retention and spatial correlation, and supporting resolution independence.

Network Architecture and Training

Input layer: Receives pixel values after Hilbert mapping
Hidden layer: LeakyReLU activation function
Output layer: 2 neurons (square/non-square) + Softmax The training process includes thousands of iterations, covering forward propagation, error calculation, backpropagation, and parameter updates, and supports weight saving and loading.

Section 06

Learning Value and Technical Highlights

Learning Value

Linear algebra: Matrix operations change from abstract concepts to concrete steps
Multivariable calculus: Manually calculate gradients to understand the application of the chain rule
Optimization techniques: Master gradient descent, learning rate, and other hyperparameter selection

Technical Highlights

Pure Python implementation: No NumPy dependency, eliminating extra abstraction layers
Complete mathematical proof: Includes derivations of activation function derivatives, Softmax+cross-entropy gradients, etc.
End-to-end system: Can train models, save weights, and run demos

Practical Significance

Lays the foundation for using frameworks like PyTorch/TensorFlow, improving debugging capabilities and paper comprehension.

Section 07

Target Audience and Learning Suggestions

Target Audience

Beginners: Understand deep learning from scratch
Experienced developers: Bridge the gap between API calls and principle understanding
Educators: Use as supplementary course material

Learning Suggestions

Beginners: Learn in the order of README → source code → demo cases → modified experiments
Experienced users: Focus on backpropagation implementation and mathematical proofs
General suggestion: Learn with basic knowledge of linear algebra and calculus

Project License: MIT License

Section 08

Summary and Outlook

This project is an excellent educational resource that proves deep learning can be understood and implemented. It emphasizes the importance of understanding underlying principles, cultivates problem analysis, formula derivation, and debugging capabilities, and provides valuable support for in-depth learning of deep learning.