Reading

Building a Large Language Model from Scratch: A Complete Handwritten LLM Training Workflow

This article introduces a complete project for building a large language model from scratch on Mac, covering 10 stages from data preparation to Ollama deployment, demonstrating a pure PyTorch implementation without relying on frameworks like HuggingFace.

大语言模型LLMPyTorch从零构建BPE分词器Transformer监督微调Ollama部署机器学习深度学习

Published 2026-06-09 02:13Recent activity 2026-06-09 02:21Estimated read 7 min

Building a Large Language Model from Scratch: A Complete Handwritten LLM Training Workflow

Section 01

Guide to the Full Workflow Project of Building LLM from Scratch: Pure PyTorch Implementation, Runable on Mac

This article introduces the open-source project "story-llm-finetuned-mac" created by developer sppandita85. This project builds a large language model from scratch on Mac, covering 10 stages from data preparation to Ollama deployment. It uses a pure PyTorch implementation without relying on frameworks like HuggingFace and supports CPU operation. Although the project is trained with only 50 moral stories (about 6000 tokens) (with memorization and overfitting issues), its workflow is consistent with industrial-grade LLMs, making it suitable for learners to understand internal mechanisms.

Section 02

Project Background and Design Philosophy

LLM training is often encapsulated as a "black box" by advanced frameworks, which is difficult to meet the in-depth learning needs of developers. This project takes "architecture fidelity" as its core design philosophy. Although the data scale is small, it completely reproduces the full workflow of industrial-grade LLM training, allowing learners to experience the LLM life cycle on personal Mac devices. The project uses a pure PyTorch implementation without relying on existing frameworks and supports CPU operation, lowering the entry barrier.

Section 03

Data Processing and Model Construction (Stages 1-4)

The project divides the training workflow into 10 stages:

Stage1 (Data Preparation)：Clean raw markdown, insert special tokens, split into training/validation sets;
Stage2 (Tokenizer Training)：Train a custom BPE tokenizer from scratch to handle out-of-vocabulary words;
Stage3 (Data Encoding)：Encode text into token IDs, store as binary files, implement sliding window DataLoader;
Stage4 (Model Construction)：Implement GPT architecture Transformer from scratch using PyTorch, including components like multi-head attention and feed-forward network, and verify model correctness.

Section 04

Pre-training and Supervised Fine-tuning (Stages5-8)

Stage5 (Pre-training)：Use AdamW optimizer, combined with warmup and cosine annealing learning rate, implement gradient clipping and checkpoint saving;
Stage6 (Text Generation)：Sample text generation from pre-trained model to evaluate pre-training effect;
Stage7 (Q&A Dataset Construction)：Derive instruction Q&A pairs from pre-trained corpus and convert to dialogue training format;
Stage8 (Supervised Fine-tuning)：Train with Q&A dataset, adopt mask loss strategy (only calculate loss on answer part) to let the model learn to follow instructions.

Section 05

Interaction and Deployment (Stages9-10)

Stage9 (Dialogue Interaction)：Provide command-line interface for users to interactively converse with the fine-tuned model;
Stage10 (Ollama Deployment)：Convert model to GGUF format (quantization reduces memory usage) and deploy to Ollama platform for easy user access.

Section 06

Technical Highlights and Scalability

The project code is well-organized, with shared code stored in the common directory (including configuration, tokenizer, model, etc.), and the modular design is easy to extend. To scale to real-scale training, you only need to modify hyperparameters in common/config.py: increase vocabulary size, number of layers, number of attention heads, embedding dimension, increase training epochs, point to a larger corpus, and switch to GPU device.

Section 07

Learning Value and Practical Significance

This project provides an excellent entry path for LLM learners. By running each stage, you can establish a full-process understanding (data processing, tokenizer, Transformer, optimization strategy, deployment). The author provides a Model Card to record model information and publishes the model to the Ollama platform (ollama.com/sppandita85/story-llm) for easy direct experience.

Section 08

Project Summary

The "story-llm-finetuned-mac" project is small in scale but complete in workflow. The pure PyTorch implementation allows learners to understand the essence of each technical link. For developers who want to master LLM technology at the principle level, it is an excellent open-source project worth in-depth study.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49