Reading

agent-replay: A Terminal Debugging and Replay Tool for Agent Workflows

agent-replay is a Terminal User Interface (TUI)-based debugging tool that supports step-by-step replay and debugging of non-deterministic agent workflow execution traces, helping developers understand the agent decision-making process and locate issues.

agent-replay智能体调试TUI工作流回放Agent调试工具执行轨迹LLM终端界面

Published 2026-06-01 01:13Recent activity 2026-06-01 01:22Estimated read 8 min

Section 01

agent-replay: A Terminal Debugging and Replay Tool for Agent Workflows

agent-replay is a Terminal User Interface (TUI)-based debugging tool developed by Chopin998. It supports step-by-step replay and debugging of non-deterministic agent workflow execution traces, helping developers understand the agent decision-making process and locate issues. The project is open-sourced on GitHub (link), with the latest update at 2026-05-31T17:13:20Z.

Core Value: Solve the non-determinism problem in agent debugging, and provide visibility into the execution process and interactive exploration capabilities.

Section 02

Project Background and Challenges

With the rapid development of Large Language Models (LLMs), agent-based automated workflows are becoming increasingly popular, but autonomy brings debugging difficulties:

Agent workflows are non-deterministic; the same input may produce different execution paths, making traditional breakpoint debugging ineffective.
Developers face difficulties such as inability to reproduce issues, difficulty understanding decision logic, tracking tool call chains, and lack of visibility into the execution process.

agent-replay was created to address these pain points.

Section 03

Core Features

agent-replay provides three key capabilities:

Trace Parsing and Import

Supports parsing standard JSON-format execution trace logs (including execution status, thinking process, tool calls and parameters, results, timestamps, etc.). After import, the complete execution process can be reproduced locally.

Step-by-Step Execution View

Provides a time-travel debugging experience similar to a video player:

Step forward/backward browsing
Jump to a specific time point
Pause at any step for inspection
Quickly locate tool calls or decision nodes

Detailed Inspection View

Displays fine-grained information for each step:

Prompt context
Original model output
Reasoning chain (if supported)
Tool call details (name, parameters, return value)

Section 04

Technical Implementation

agent-replay is developed in Python, with core modules including:

app.py: Main application entry, handling interaction and interface rendering
parser.py: Parses JSON trace files and extracts execution steps
mock_data.json: Sample data showing the trace format

TUI Design Advantages

Choosing a terminal interface over a graphical interface has the following benefits:

Lightweight: No need for a graphical environment; can be used on SSH remote servers
Fast startup: No graphical rendering overhead
Friendly integration: Easy to integrate into command-line workflows
Resource-friendly: Suitable for resource-constrained environments

Section 05

Usage Scenarios and Integration

Applicable Scenarios

Problem Diagnosis: Replay traces to find specific steps of unexpected behavior
Behavior Understanding: Analyze agent task decomposition, tool selection, and intermediate result processing
Regression Testing: Save and compare traces to verify behavior stability
Team Collaboration: Share traces as error reports to reproduce problem scenarios

Compatibility

Supports frameworks that output standard JSON traces:

LangChain
LlamaIndex
AutoGPT
Custom agent implementations

The parser.py can be extended to support custom trace formats.

Section 06

Comparison and Project Significance

Comparison with Traditional Tools

Feature	Traditional Log Viewing	agent-replay
Structured Display	Text search	Interactive step browsing
Context Understanding	Manual association	Automatic association of prompts and outputs
Time Travel	None	Supports forward/backward/jump
Visualization	Plain text	TUI interface with clear hierarchy
Usability	Requires familiarity with log format	Intuitive keyboard operations

Project Significance

Fills the gap in the agent development toolchain, providing much-needed observability tools
Open-source model promotes community collaboration and supports more frameworks and formats

Section 07

Usage Recommendations and Conclusion

Usage Recommendations

Ensure Trace Completeness: Agent frameworks need to record complete execution information (prompts, responses, tool calls, etc.)
Pay Attention to Privacy and Security: Traces may contain sensitive information; be cautious about storage and sharing security
Combine with Log Analysis: agent-replay is suitable for interactive exploration; batch analysis needs to be paired with log tools

Conclusion

agent-replay transforms abstract execution processes into interactive visual traces, lowering the threshold for agent debugging. In today's era of rapid agent technology development, such tools mark the maturity of the agent development ecosystem.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15