Zing Forum

Reading

LLM Pipeline Visualizer: Visualize the Reasoning Process of Large Language Models in the Browser

An interactive educational tool that fully demonstrates the complete reasoning process of LLMs from text to tokens, embeddings, attention, logits, and sampling through seven steps, running entirely in the browser.

LLM可视化Transformers.js教育工具注意力机制GPT-2tokenization机器学习教育
Published 2026-06-11 04:43Recent activity 2026-06-11 04:57Estimated read 8 min
LLM Pipeline Visualizer: Visualize the Reasoning Process of Large Language Models in the Browser
1

Section 01

[Introduction] LLM Pipeline Visualizer: An Educational Tool for Visualizing LLM Reasoning in the Browser

This article introduces an interactive educational tool called LLM Pipeline Visualizer, which fully demonstrates the entire process of a large language model (taking DistilGPT-2 as an example) from text input to generating the next token through 7 steps. Key features of the tool include: running real models directly in the browser using Transformers.js (no simulated data), supporting real-time interactive operations (such as adjusting temperature, viewing attention heads), using a "scrolling narrative" approach to explain concepts step by step, and providing Spanish content and shareable exploration links.

2

Section 02

Project Background and Overview

This project is developed and maintained by Mahiler1909. The source code is hosted on GitHub (link: https://github.com/Mahiler1909/llm-pipeline-visualizer), and the online demo is available at https://mahiler1909.github.io/llm-pipeline-visualizer/, released in June 2026. Positioned as an educational tool, it demonstrates the autoregressive generation process of LLMs using a "scrolling narrative" approach, with all data coming from real model outputs rather than simulations. After users input a prompt, they will browse 7 full-screen chapters in sequence, each teaching a core concept with interactive components.

3

Section 03

Core Steps and Interactive Features

The tool includes 7 core steps:

  1. Texto (Text):The original text input by the user, serving as the starting point for interaction.
  2. Tokens (Tokenization):Shows how text is split into tokens and corresponding IDs via BPE, with a built-in real-time mini tokenizer for users to test.
  3. Embeddings:Displays real word embedding vectors, fetched on demand via HTTP Range requests. Visualizations include 48-dimensional bar charts and cosine similarity matrices.
  4. Atención (Attention):Shows real attention calculations for layer-0, supporting viewing by attention head or average, and displaying attention percentages.
  5. Logits:Displays the raw logits output by the model and the probability distribution after softmax, providing the top-15 candidate words and a temperature slider to adjust the distribution.
  6. Muestreo (Sampling):Shows the process of sampling tokens from the probability distribution, supporting top-k/top-p adjustment, greedy mode switching, and resampling.
  7. El bucle (The Loop):Appends the sampled token to the original text and re-runs the process to achieve autoregressive generation, supporting tracking of loop counts.
4

Section 04

Highlights of Technical Implementation

Key technical implementations include:

  • Real Inference in the Browser:Uses Transformers.js (ONNX backend) to run the DistilGPT-2 model. The first load is about 165MB (fp16 precision), and supports switching GPT-2 variants (e.g., gpt2-medium) via URL parameters.
  • Progressive Weight Loading:Embedding layers are fetched on demand via HTTP Range requests (3KB per token), attention layers are lazily loaded (7MB), and the Cache API is used to persist downloaded weights.
  • Stable Sampling Mechanism:Ensures repeatable sampling results from the same distribution, with temperature adjustments taking effect immediately without re-inference.
  • Tech Stack:Frontend uses native JavaScript (ES modules), DOM+SVG; no build steps, styles are in a single CSS file.
5

Section 05

Educational Design and Application Value

Educational Design:

  • Spanish Content:Each chapter includes main explanations, collapsible formulas (Profundizar), and hands-on experiments (Pruébalo).
  • Shareable and Demo-Friendly:Prompt text is encoded in the URL (?p=...), supporting sharing; add ?presentar or press the P key to enter demo mode (content fades in gradually, with shortcut keys for progression). Application Value:
  • Learners:Balances abstraction and detail, suitable for beginners to get started and advanced users to dive deeper.
  • Educators:Can be directly used in classrooms; demo mode facilitates explanation, and shareable links support after-class exploration.
  • Researchers:Verify understanding of attention mechanisms, observe the effects of sampling strategies, and adjust parameter impacts.
6

Section 06

Tool Comparison and Summary

Comparison with Other Tools

Feature LLM Pipeline Visualizer Traditional Tutorials Interactive Notebooks
No installation required ✅ Runs directly in browser ❌ Requires Jupyter
Real model data ❌ Simplified examples
Progressive exploration ✅ 7 structured chapters ⚠️ Depends on user organization
Real-time interaction
Demo-friendly ✅ Dedicated mode ⚠️
Shareable state ✅ URL-encoded

Summary:This tool successfully balances the contradictions between realism and understandability, depth and ease of use, education and demonstration, lightness and full functionality. It provides a transparent "black box" observation window for LLM learners and is an excellent example of a technical educational tool.