Reading

Building Large Models from Scratch: 23 Notebooks for a Full-Stack Understanding of Modern LLMs

A hands-on tutorial that implements core components of large models from scratch without using pre-built libraries, covering the complete tech stack from Tokenizer, Attention, MoE, RLHF to inference acceleration. Ideal for learners who want deep understanding rather than just knowing how to call APIs.

大语言模型PyTorchJupyter NotebookTransformerBPE TokenizerAttention机制MoERLHF推理加速知识蒸馏

Published 2026-05-21 14:15Recent activity 2026-05-21 14:19Estimated read 5 min

Building Large Models from Scratch: 23 Notebooks for a Full-Stack Understanding of Modern LLMs

Section 01

Introduction: 23 Notebooks to Build a Full-Stack Understanding of Modern LLMs from Scratch

Section 02

Background: Why Do We Need to 'Build Large Models from Scratch'?

Current learning resources for large language models have two shortcomings: one type is high-level paper reviews that explain principles but can't be turned into code; the other is API-calling tutorials that let you run things quickly but feel like a black box. The walkinglabs/modern-llm-notebook project fills this gap by requiring the use of PyTorch to implement core components from scratch, forcing learners to deal with tensor operations and gradient flow to build deep understanding.

Section 03

Methodology: A Complete Learning Path with Five Modules

The project is divided into five progressive modules:

Basic Construction (Notebooks 01-05): Implement Tokenizer, positional encoding, Multi-Head Attention, and Mini-GPT skeleton;
Training Techniques (06-14): Architecture optimization (LLaMA improvements, MoE), training workflow, data engineering, LoRA, RLHF;
Inference Acceleration (15-17): Generation strategies, KV Cache, FlashAttention, speculative decoding;
Cutting-Edge Exploration (18-20): Long context extension, Chain of Thought, VLM;
Production Practice (21-23): Evaluation system, knowledge distillation, policy distillation. Each Notebook follows the cycle: 'Intuitive understanding → Manual calculation verification → Code implementation → Experimental observation'.

Section 04

Evidence: Direct Correspondence with Classic Papers

The project's core algorithms are closely linked to original papers:

Paper	Notebook	Implemented Content
Attention Is All You Need	04	Multi-Head Attention, Sinusoidal PE
LLaMA	06	RMSNorm, SwiGLU, RoPE
LoRA	12	Low-Rank Adaptation, A*B Decomposition
RLHF/PPO	14	Reward Model, PPO clip
This design allows learners to see runnable code right after reading the paper, deepening their understanding.

Section 05

Suggestions: Technical Threshold and Learning Guide

The project requires Python3.9+, PyTorch2.0+, and 16GB of memory. Most Notebooks can run on CPU; GPU is recommended for training. The Notebooks are modular, so you can jump to sections as needed:

Those with Transformer basics can skip to MoE or inference acceleration;
Those focusing on deployment can look at production practice;
Those wanting to complete their knowledge graph can follow the sequence. A React+Vite web reader is also provided to enhance the experience.

Section 06

Conclusion: Practical Value and Unique Positioning

Compared to tutorials like nanogpt, this project's uniqueness lies in its completeness (covering the full stack from Tokenizer to policy distillation) and cutting-edge nature (including 2024-2025 latest advances like speculative decoding and VLM). It is suitable for researchers, engineers, and students to deeply understand the internal mechanisms of large models. The deep understanding built through manual implementation is incomparable to just calling APIs.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15