Reading

llm_sim: An Observable Simulator for Internal Behaviors of Large Language Models

A Python project for educational purposes that simulates the complete reasoning process of large language models through a modular architecture, including prompt construction, tokenization, reasoning agent, tool calling, and token-by-token generation, while providing full JSON execution trace visualization.

LLM教育工具模拟器可观测性Pythontoken生成工具调用教学

Published 2026-03-28 20:44Recent activity 2026-03-28 20:48Estimated read 7 min

llm_sim: An Observable Simulator for Internal Behaviors of Large Language Models

Section 01

llm_sim: Observable LLM Internal Behavior Simulator (Introduction)

llm_sim is a Python project designed specifically for educational purposes. It simulates the complete reasoning process of large language models through a modular architecture (including prompt construction, tokenization, reasoning agent, tool calling, and token-by-token generation) and provides JSON execution trace visualization. Its core value lies in transparently revealing the black-box reasoning process of LLMs, making it suitable for teaching, debugging understanding, or architecture learning.

Section 02

Project Background: An Educational Tool to Address the LLM Black Box Problem

The reasoning process of real large language models (LLMs) is usually a black box, making it difficult to intuitively understand their internal mechanisms. llm_sim does not aim for real model performance; instead, it focuses on clarity and observability. By explicitly recording every intermediate step into JSON trace files, it allows users to deeply explore the model's 'thinking process' and helps learners understand the working principles of LLMs.

Section 03

Architecture Design: A Highly Decoupled Modular System

It adopts a modular architecture where each component only depends on the Trace data class, ensuring strong maintainability and extensibility. Core modules include: trace.py (execution trace recording), prompt_builder.py (prompt template combination), tokenizer.py (dynamic vocabulary tokenization), llm_core.py (token-by-token generation logic), tools.py (calculator and knowledge base tools), agent.py (reasoning layer), and pipeline.py (top-level orchestration). Key design decisions: component isolation, unified interface, dynamic vocabulary, and full traceability.

Section 04

Detailed Simulation Process: Complete Steps from Input to Output

Prompt Construction: Wrap system prompts and user inputs into labeled structured templates; 2. Tokenization: Split text using regular expressions and map to IDs via dynamic vocabulary; 3. Reasoning Agent: Detect intent through heuristic rules (call calculator for mathematical computation, call knowledge base for factual queries); 4. Token-by-token Generation: Candidate sampling, scoring (including repetition penalty and target enhancement), temperature-scaled softmax, and recording of complete candidate lists; 5. Result Assembly: Combine tool outputs and generated text into the final answer.

Section 05

Interactive Interface and Visualization: Intuitive Exploration of Model Processes

Three interaction methods are provided: 1. Web Interface: Animate pipeline stages, color-code token generation probabilities, and support viewing full traces; 2. Trace Viewer: Collapsible step cards, JSON syntax highlighting, inline probability bar charts, and reasoning trace visualization; 3. CLI Tool: Scripted use for quick testing, outputting JSON trace files and final answers.

Section 06

Operation and Deployment: Multi-environment Support and Data Security

Session isolation mechanism ensures multi-user data security: each user's trace is stored in an independent directory, and audit logs (audit.jsonl) record all operations and are not accessible via browsers. Deployment methods: local development (venv + server.py), production environment (Gunicorn), and Docker containerized deployment.

Section 07

Educational Value and Application Scenarios

Applicable to multiple scenarios: 1. Teaching Demonstration: Show LLM internal principles without complex real models; 2. Architecture Learning: Understand modular design through source code; 3. Debugging Understanding: Observe intermediate steps to understand why answers are generated; 4. Prototype Verification: Quickly verify new architecture ideas; 5. Security Research: Understand the security boundaries of tool calls and knowledge retrieval.

Section 08

Summary and Outlook

llm_sim is a carefully designed teaching tool. With its transparent architecture and visualization capabilities, it makes LLM internal mechanisms accessible. Although it does not pursue real performance, it excels in observability and educational value. It is an excellent learning resource for developers, researchers, and students, and its modular design can serve as a basic framework to extend more complex simulation functions.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15