Zing Forum

Reading

retention.sh: An Always-On Workflow Referee System for AI Coding Agents

A quality assurance system designed for AI coding agents, which captures missing steps, verifies execution quality via four always-on hooks (session recovery, workflow detection, tool tracking, completion interception), and supports workflow replay to reduce costs.

AI agentworkflowQAquality assurancereplaycost optimizationOpenAIAnthropicLangChainCrewAI
Published 2026-04-10 07:40Recent activity 2026-04-10 07:46Estimated read 6 min
retention.sh: An Always-On Workflow Referee System for AI Coding Agents
1

Section 01

retention.sh: An Always-On Workflow Referee System for AI Coding Agents (Introduction)

retention.sh is a quality assurance system designed for AI coding agents, aiming to solve the problem of AI agents' "confident mistakes" (such as skipping tests or missing steps). It captures missing steps and verifies execution quality through four always-on hooks (session recovery, workflow detection, tool tracking, completion interception), and supports workflow replay to reduce costs by 60-70%. Its core positioning is an "always-on workflow referee" that provides hard rulings (PASS/FAIL/BLOCKED) rather than being a simple logging tool.

2

Section 02

Project Background: Reliability Dilemma of AI Agents and Core Positioning

With the popularity of AI coding agents like Claude Code and Cursor Composer, developers face the problem of agents' "confident mistakes"—claiming tasks are completed while skipping tests, missing key steps, or ignoring context. retention.sh emerged as an "always-on workflow referee" to systematically capture missing steps and intercept issues in advance. Its core idea is to show the tests skipped, steps forgotten, and context missing by the agent, prevent recurrence of problems, and provide clear quality rulings.

3

Section 03

Three Core Functions and Four Hook Mechanisms

Core Functions: 1. Quality Check: Track execution processes, identify completed/missing steps, and output hard rulings; 2. Workflow Replay: Capture expensive executions and replay them at 60-70% lower cost, verifying effectiveness through strict refereeing; 3. Full Tracking: Record tool call screenshots, evidence, and cost analysis, and generate shareable links.

Four Hooks: on-session-start (recover unfinished work), on-prompt (detect workflow type and inject necessary steps), on-tool-use (track tool calls and prompt for missing steps), on-stop (intercept "completion" claims for unfinished tasks).

4

Section 04

Multi-Platform SDK Support and Privacy Protection

retention.sh offers extensive SDK integration, supporting mainstream AI agent frameworks like OpenAI, Anthropic, LangChain, and CrewAI. It can be enabled with one line of code (e.g., track() for automatic detection or specifying a provider).

For privacy, the system automatically cleans sensitive data (API keys, passwords, etc.), generates structured event records, and stores them locally in ~/.retention/activity.jsonl by default, ensuring telemetry data does not leak confidential information.

5

Section 05

Actual Effect Data and Team Collaboration

Effect Data: Replay cost savings of 63-73%, 89% referee consistency rate, and zero corrections needed after testing 3 workflow families.

Team Collaboration: Share workflow memory and establish unified team quality standards via team code mechanisms (e.g., create a team to get code K7XM2P, members use the RETENTION_TEAM environment variable when joining).

6

Section 06

Applicable Scenarios and Quick Installation

Applicable Scenarios: 1. Engineers: Solve the problem of agents skipping tests/search steps, and replay repeated workflows at low cost; 2. Team Leaders: Understand agents' actual execution status, missing steps, and cost-saving points; 3. Founders: Turn repetitive AI work into reusable operational leverage.

Installation: Quick installation (curl -sL retention.sh/install.sh | bash) or pip installation (pip install retention). Usage example: retention.qa_check(url='http://localhost:3000').

7

Section 07

Conclusion: An Important Innovation in AI Agent Quality Assurance

retention.sh is an important innovation in the field of AI agent quality assurance. It not only provides judgment capabilities (different from traditional logging tools) but also achieves significant cost optimization through replay. As AI agents evolve into production-level infrastructure, such supervision and verification mechanisms will become increasingly important. Core Insight: AI agents need not only more capabilities but also better supervision and verification mechanisms, and retention.sh is a powerful implementation of this mechanism.