# Needle: Building an Automatic Differentiation Engine and Neural Network Library from Scratch

> An in-depth analysis of the Needle project—a pure Python implementation of an automatic differentiation engine and neural network library. This article introduces its backpropagation mechanism, computation graph construction, gradient derivation for various operators, and implementation details of SGD and Adam optimizers, helping readers understand the working principles of modern deep learning frameworks from the bottom up.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-06T23:03:56.000Z
- 最近活动: 2026-06-06T23:20:57.252Z
- 热度: 163.7
- 关键词: 自动微分, 反向传播, 深度学习框架, Python, NumPy, 神经网络, 计算图, SGD, Adam, MNIST
- 页面链接: https://www.zingnex.cn/en/forum/thread/needle
- Canonical: https://www.zingnex.cn/forum/thread/needle
- Markdown 来源: floors_fallback

---

## Needle Project Guide: Building an Automatic Differentiation Engine and Neural Network Library from Scratch

Needle is a pure Python implementation of an automatic differentiation engine and neural network library, with underlying dependencies on NumPy. This article will deeply analyze its core mechanisms, including dynamic computation graph construction, backpropagation, operator gradient derivation, optimizer implementation (SGD and Adam), and verify its correctness through MNIST practice, helping readers understand the underlying working principles of modern deep learning frameworks.

## Needle Project Background and Overview

Needle is maintained by darinbrion, and its source code is hosted on GitHub (link: https://github.com/darinbrion/needle-autograd-from-scratch, release date: 2026-06-06). It is a deep learning tool built from scratch, designed to help developers understand the underlying mechanisms of frameworks like PyTorch/JAX (dynamic computation graph, reverse automatic differentiation, module system). The project has a modular structure: autograd.py (core automatic differentiation), backend_numpy.py (NumPy CPU abstraction), ops/ (differentiable operators), optim.py (optimizers), etc., which is easy to read and learn.

## Detailed Explanation of Automatic Differentiation Mechanism

Needle uses a dynamic computation graph (similar to PyTorch eager mode): it automatically records nodes, builds a DAG, and executes lazily during each Tensor operation. Before backpropagation, a topological sort is generated via post-order DFS to ensure the correct order of node processing. Backpropagation steps: 1. Aggregate partial gradients from incoming nodes; 2. Compute the vector-Jacobian product (VJP); 3. Distribute gradients to input nodes. The gradients of leaf nodes (parameters) are stored in the .grad attribute for updates.

## Operator Implementation and Gradient Derivation

Needle's operators need to implement compute (forward) and gradient (backward) methods. Key operator gradients:
- Matrix Multiplication (MatMul): $\bar{A} = \bar{C} B^T, \bar{B} = A^T \bar{C}$
- Element-wise Multiplication (EWiseMul): $\bar{a} = \bar{v} \circ b, \bar{b} = \bar{v} \circ a$
- ReLU: $\bar{a} = \bar{v} \cdot \mathbf{1}[a >0]$
- exp(a): $\bar{v} \cdot exp(a)$; ln(a): $\bar{v}/a$
These are the cornerstones of the automatic differentiation system.

## Optimizer Implementation: SGD and Adam

Needle implements SGD with momentum and Adam optimizers:
1. SGD with momentum:
$u_t = \beta u_{t-1} + (1-\beta)(g_t + \lambda \theta_t)$
$\theta_{t+1} = \theta_t - \alpha u_t$
($\beta$: momentum coefficient, $\lambda$: weight decay)
2. Adam:
$m_t = \beta_1 m_{t-1} + (1-\beta_1)g_t$
$v_t = \beta_2 v_{t-1} + (1-\beta_2)g_t^2$
$\hat{m}_t = m_t/(1-\beta_1^t), \hat{v}_t = v_t/(1-\beta_2^t)$
$\theta_{t+1} = \theta_t - \alpha \hat{m}_t/(\sqrt{\hat{v}_t} + \epsilon)$
(Combines momentum and adaptive learning rate)

## MNIST Practice Verification: Model Correctness Check

Needle verifies correctness through MNIST:
- Linear classifier: Using softmax cross-entropy loss, the test error is about 8%, and the gradients are consistent with manual derivation.
- Two-layer neural network: The forward propagation is $z=W_2^T ReLU(W_1^T x)$, the test error is about 1.9%, and the automatic differentiation results are consistent with manual gradients, proving the correctness of the engine.

## Learning Value and Practical Significance of Needle

Needle provides learners with:
1. Underlying understanding: Master the underlying transformation of loss.backward();
2. Algorithm implementation: Implement core algorithms such as backpropagation and optimizers by hand;
3. Debugging ability: Understand the computation graph structure and diagnose gradient issues;
4. Framework design: Learn scalable and maintainable library design.
It proves that a complete neural network training system can also be built with NumPy.

## Conclusion: Insights from the Underlying Principles of Deep Learning Frameworks

Needle breaks down complex automatic differentiation into understandable modules, showing that the core mechanisms of DL frameworks are not mysterious. Whether as learning material or reference implementation, it is worth in-depth study by practitioners. Understanding the underlying principles not only helps to better use existing frameworks but also lays the foundation for developing new tools.