Reading

retention.sh: An Always-On Workflow Referee System for AI Coding Agents

A quality assurance system designed for AI coding agents, which captures missing steps, verifies execution quality via four always-on hooks (session recovery, workflow detection, tool tracking, completion interception), and supports workflow replay to reduce costs.

AI agentworkflowQAquality assurancereplaycost optimizationOpenAIAnthropicLangChainCrewAI

Published 2026-04-10 07:40Recent activity 2026-04-10 07:46Estimated read 6 min

Section 01

retention.sh: An Always-On Workflow Referee System for AI Coding Agents (Introduction)

retention.sh is a quality assurance system designed for AI coding agents, aiming to solve the problem of AI agents' "confident mistakes" (such as skipping tests or missing steps). It captures missing steps and verifies execution quality through four always-on hooks (session recovery, workflow detection, tool tracking, completion interception), and supports workflow replay to reduce costs by 60-70%. Its core positioning is an "always-on workflow referee" that provides hard rulings (PASS/FAIL/BLOCKED) rather than being a simple logging tool.

Section 02

Project Background: Reliability Dilemma of AI Agents and Core Positioning

With the popularity of AI coding agents like Claude Code and Cursor Composer, developers face the problem of agents' "confident mistakes"—claiming tasks are completed while skipping tests, missing key steps, or ignoring context. retention.sh emerged as an "always-on workflow referee" to systematically capture missing steps and intercept issues in advance. Its core idea is to show the tests skipped, steps forgotten, and context missing by the agent, prevent recurrence of problems, and provide clear quality rulings.

Section 03

Three Core Functions and Four Hook Mechanisms

Core Functions: 1. Quality Check: Track execution processes, identify completed/missing steps, and output hard rulings; 2. Workflow Replay: Capture expensive executions and replay them at 60-70% lower cost, verifying effectiveness through strict refereeing; 3. Full Tracking: Record tool call screenshots, evidence, and cost analysis, and generate shareable links.

Four Hooks: on-session-start (recover unfinished work), on-prompt (detect workflow type and inject necessary steps), on-tool-use (track tool calls and prompt for missing steps), on-stop (intercept "completion" claims for unfinished tasks).

Section 04

Multi-Platform SDK Support and Privacy Protection

retention.sh offers extensive SDK integration, supporting mainstream AI agent frameworks like OpenAI, Anthropic, LangChain, and CrewAI. It can be enabled with one line of code (e.g., track() for automatic detection or specifying a provider).

For privacy, the system automatically cleans sensitive data (API keys, passwords, etc.), generates structured event records, and stores them locally in ~/.retention/activity.jsonl by default, ensuring telemetry data does not leak confidential information.

Section 05

Actual Effect Data and Team Collaboration

Effect Data: Replay cost savings of 63-73%, 89% referee consistency rate, and zero corrections needed after testing 3 workflow families.

Team Collaboration: Share workflow memory and establish unified team quality standards via team code mechanisms (e.g., create a team to get code K7XM2P, members use the RETENTION_TEAM environment variable when joining).

Section 06

Applicable Scenarios and Quick Installation

Applicable Scenarios: 1. Engineers: Solve the problem of agents skipping tests/search steps, and replay repeated workflows at low cost; 2. Team Leaders: Understand agents' actual execution status, missing steps, and cost-saving points; 3. Founders: Turn repetitive AI work into reusable operational leverage.

Installation: Quick installation (curl -sL retention.sh/install.sh | bash) or pip installation (pip install retention). Usage example: retention.qa_check(url='http://localhost:3000').

Section 07

Conclusion: An Important Innovation in AI Agent Quality Assurance

retention.sh is an important innovation in the field of AI agent quality assurance. It not only provides judgment capabilities (different from traditional logging tools) but also achieves significant cost optimization through replay. As AI agents evolve into production-level infrastructure, such supervision and verification mechanisms will become increasingly important. Core Insight: AI agents need not only more capabilities but also better supervision and verification mechanisms, and retention.sh is a powerful implementation of this mechanism.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15