Reading

Controlled Agent Runtime: Building an Observable, Replayable Controlled Multi-Agent Workflow Runtime

A production-ready multi-agent runtime framework that achieves perception isolation via ActorView, ensures state safety with DomainEvent, supports regression testing through Golden Replay, and demonstrates complex interactions like hidden information handling, delegated actions, and long-term memory using the Hazard Lab scenario.

Agent运行时多智能体事件溯源LangGraph可观测性回归测试ActorViewDomainEventGolden Replay状态管理

Published 2026-06-01 14:15Recent activity 2026-06-01 14:26Estimated read 8 min

Section 01

Introduction / Main Floor: Controlled Agent Runtime: Building an Observable, Replayable Controlled Multi-Agent Workflow Runtime

Section 02

Original Author and Source

Original Author/Maintainer: yukinorin775780
Source Platform: GitHub
Original Title: Controlled Agent Sim Runtime
Original Link: https://github.com/yukinorin775780/controlled-agent-runtime-ai-rd
Publication Time: June 2026

Section 03

Background: Why Do Agents Need a "Controlled" Runtime?

The explosive growth of Large Language Model (LLM) Agents has brought unprecedented automation capabilities, but it also reveals a core contradiction: the conflict between the model's openness and the determinism required in production environments.

LLMs are inherently probabilistic— the same input may produce different outputs. This feature is an advantage in creative writing or brainstorming scenarios, but it becomes a source of risk in production systems that require precise state management. When agents are given the ability to call tools, modify databases, send emails, etc., "uncontrolled" behavior can lead to serious consequences.

Most current industry agent frameworks (such as LangChain, AutoGPT, OpenAI Assistants API, etc.) adopt the "prompt engineering + tool calling" paradigm, delegating much responsibility to the model's "comprehension ability". This approach works well in the prototype phase, but faces several fundamental challenges:

First is state consistency. If an agent modifies a database record during a conversation, how to ensure that this modification is predictable, auditable, and rollbackable? Second is visibility. When an agent makes a decision, can developers accurately understand what information it "saw" and what factors it "considered"? Third is testability. How to perform regression testing on agent behavior without calling expensive LLM APIs?

The Controlled Agent Runtime project is designed to answer these questions.

Section 04

Core Idea: Separating Intent and Execution

The core architectural principle of the project can be summarized in one sentence: LLMs are responsible for intent interpretation and expression, while deterministic systems handle state changes. This separation is not a simple "LLM generates code and then executes it", but a deeper architectural boundary design:

Section 05

Intent Layer

LLM-facing nodes are responsible for:

Understanding the intent of user input
Generating natural language responses and "expressions" (barks)
Proposing "suggestions" for state changes

This layer fully leverages the language understanding and generation capabilities of LLMs, but does not directly manipulate any system state.

Section 06

Execution Layer

Deterministic systems are responsible for:

Game mechanics such as movement, inspection, and inventory management
Final submission of state changes
Event sourcing and persistence
Replayable execution records

This layer ensures all state changes are predictable, testable, and auditable.

Section 07

Event Layer

Connecting the two layers is a strongly typed DomainEvent system. All state changes must be represented via events, which are processed uniformly through EventDrain. This design brings several benefits:

Unified Interface: Regardless of which agent or system the change comes from, it is handled via the same event mechanism
Serializable: Events can be persisted, transmitted, and replayed
Auditable: The complete event log serves as the system's audit trail
Testable: Event sequences can be constructed for regression testing without calling LLMs

Section 08

ActorView: Perception Isolation

ActorView is one of the most elegant designs in the project. It solves a common problem in multi-agent systems: What information should each agent "see"?

In real-world multi-agent scenarios, information is often distributed and asymmetric. One agent may know certain information that another does not; some information is public, some is private. The traditional approach of "feeding all context to the LLM" is neither efficient (token waste) nor safe (information leakage).

ActorView achieves fine-grained perception control through the following mechanisms:

World State Filtering: Filter visible world objects based on the agent's position, capabilities, and role
History Trimming: Only provide historical events that the agent has access to
Private Memory Injection: Each agent has its own memory service to store learned knowledge
Peer State Visibility: Control which other agents' states an agent can see

This design makes "hidden information" a first-class citizen of the system. In the Hazard Lab demo scenario, the Scout Agent can detect hidden gas traps, while other agents cannot—this is not done by telling the LLM via prompt to "pretend you can't see", but through real information isolation.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15