# Building a Large Language Model from Scratch: A Complete Handwritten LLM Training Workflow

> This article introduces a complete project for building a large language model from scratch on Mac, covering 10 stages from data preparation to Ollama deployment, demonstrating a pure PyTorch implementation without relying on frameworks like HuggingFace.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-08T18:13:32.000Z
- 最近活动: 2026-06-08T18:21:17.590Z
- 热度: 163.9
- 关键词: 大语言模型, LLM, PyTorch, 从零构建, BPE分词器, Transformer, 监督微调, Ollama部署, 机器学习, 深度学习
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-85f15a71
- Canonical: https://www.zingnex.cn/forum/thread/llm-85f15a71
- Markdown 来源: floors_fallback

---

## Guide to the Full Workflow Project of Building LLM from Scratch: Pure PyTorch Implementation, Runable on Mac

This article introduces the open-source project "story-llm-finetuned-mac" created by developer sppandita85. This project builds a large language model from scratch on Mac, covering 10 stages from data preparation to Ollama deployment. It uses a pure PyTorch implementation without relying on frameworks like HuggingFace and supports CPU operation. Although the project is trained with only 50 moral stories (about 6000 tokens) (with memorization and overfitting issues), its workflow is consistent with industrial-grade LLMs, making it suitable for learners to understand internal mechanisms.

## Project Background and Design Philosophy

LLM training is often encapsulated as a "black box" by advanced frameworks, which is difficult to meet the in-depth learning needs of developers. This project takes "architecture fidelity" as its core design philosophy. Although the data scale is small, it completely reproduces the full workflow of industrial-grade LLM training, allowing learners to experience the LLM life cycle on personal Mac devices. The project uses a pure PyTorch implementation without relying on existing frameworks and supports CPU operation, lowering the entry barrier.

## Data Processing and Model Construction (Stages 1-4)

The project divides the training workflow into 10 stages:
- **Stage1 (Data Preparation)**：Clean raw markdown, insert special tokens, split into training/validation sets;
- **Stage2 (Tokenizer Training)**：Train a custom BPE tokenizer from scratch to handle out-of-vocabulary words;
- **Stage3 (Data Encoding)**：Encode text into token IDs, store as binary files, implement sliding window DataLoader;
- **Stage4 (Model Construction)**：Implement GPT architecture Transformer from scratch using PyTorch, including components like multi-head attention and feed-forward network, and verify model correctness.

## Pre-training and Supervised Fine-tuning (Stages5-8)

- **Stage5 (Pre-training)**：Use AdamW optimizer, combined with warmup and cosine annealing learning rate, implement gradient clipping and checkpoint saving;
- **Stage6 (Text Generation)**：Sample text generation from pre-trained model to evaluate pre-training effect;
- **Stage7 (Q&A Dataset Construction)**：Derive instruction Q&A pairs from pre-trained corpus and convert to dialogue training format;
- **Stage8 (Supervised Fine-tuning)**：Train with Q&A dataset, adopt mask loss strategy (only calculate loss on answer part) to let the model learn to follow instructions.

## Interaction and Deployment (Stages9-10)

- **Stage9 (Dialogue Interaction)**：Provide command-line interface for users to interactively converse with the fine-tuned model;
- **Stage10 (Ollama Deployment)**：Convert model to GGUF format (quantization reduces memory usage) and deploy to Ollama platform for easy user access.

## Technical Highlights and Scalability

The project code is well-organized, with shared code stored in the common directory (including configuration, tokenizer, model, etc.), and the modular design is easy to extend. To scale to real-scale training, you only need to modify hyperparameters in common/config.py: increase vocabulary size, number of layers, number of attention heads, embedding dimension, increase training epochs, point to a larger corpus, and switch to GPU device.

## Learning Value and Practical Significance

This project provides an excellent entry path for LLM learners. By running each stage, you can establish a full-process understanding (data processing, tokenizer, Transformer, optimization strategy, deployment). The author provides a Model Card to record model information and publishes the model to the Ollama platform (ollama.com/sppandita85/story-llm) for easy direct experience.

## Project Summary

The "story-llm-finetuned-mac" project is small in scale but complete in workflow. The pure PyTorch implementation allows learners to understand the essence of each technical link. For developers who want to master LLM technology at the principle level, it is an excellent open-source project worth in-depth study.
