Reading

Worker-Critic Mode: Engineering Practice of AI Agent Collaborative Workflow

An example project demonstrating the Worker-Critic agent workflow architecture, exploring best practices for multi-agent collaboration in generating high-quality technical diagrams through comparative experiments under three conditions: baseline, same-model review, and external review.

Worker-Critic模式AI Agent多Agent协作Prompt工程CodexClaude质量评审实验框架

Published 2026-04-07 21:16Recent activity 2026-04-07 21:23Estimated read 6 min

Section 01

[Introduction] Worker-Critic Mode: Engineering Practice of AI Agent Collaborative Workflow

The worker-critic-example project open-sourced by PredictiveScienceLab demonstrates the engineering implementation of the Worker-Critic agent workflow mode through a diagram generation task. It builds a comparative framework with three experimental conditions to explore best practices for multi-agent collaboration in generating high-quality technical diagrams, providing reusable experimental references for research on multi-agent collaboration mechanisms.

Section 02

Project Background and Core Issues

With the improvement of large model capabilities, Agent architecture applications have increased, but a single Agent is prone to "drift" (deviation from the initial goal due to accumulated context). The Worker-Critic mode draws on code review mechanisms and introduces an independent Critic Agent to monitor the output quality of the Worker. The project builds a comparative framework with three experimental conditions through specific diagram generation tasks to quantitatively evaluate the actual benefits of this mode.

Section 03

Experimental Design: Three Comparative Conditions

Three experimental conditions are designed:

Condition A (Baseline): A single Agent receives the task description and basic Prompt to complete diagram generation independently, serving as the evaluation benchmark.
Condition B (Same-Model Review): The Worker session runs continuously, with an additional same-model Critic session (persistent instead of rebuilt each time) to review the SVG and provide feedback.
Condition C (External Review): The Worker session runs continuously, and each review calls an external GPT model (gpt-5.4-pro), combining historical reviews to provide a third-party perspective.

Section 04

Technical Implementation Details

Prompt Engineering: Modular design, separating basic Prompt from additional instructions, dynamically combining them via scripts to generate the final Prompt.
Multi-Platform Support: Compatible with OpenAI Codex (launch_codex_exec.py) and Anthropic Claude (launch_claude_exec.py), each with its own startup script and runner.
Isolated Environment: Each run is in an independent temporary directory (/tmp/worker-critic-example-runs//), with an independent git repository, supporting parallel runs and complete log saving.
Figma Integration: Optional; reads and writes Figma files via MCP server, aborting if pre-check permissions fail.

Section 05

Implementation of the Review Mechanism

External Review Script: scripts/external_review.py receives project description, SVG, and historical reviews, calls the OpenAI API to output detailed Markdown reviews and JSON structured summaries.
Historical Records: Saved in runs//reviews/, subsequent reviews can include history to ensure context continuity.
Claude Review: scripts/anthropic_review.py calls the Claude model on Azure Foundry, supporting multi-model selection and recording compatibility information.

Section 06

Result Collection and Comparative Analysis

scripts/build_comparison_artifacts.py collects the final diagrams from the three conditions and generates:

Side-by-side comparison PNGs
Iteration process GIFs
A summary report containing the run root directory, number of frames, and product paths, visually showing the effect differences.

Section 07

Engineering Best Practices and Research Value

Best practices: Prompt version control, environment isolation, multi-platform abstraction, complete log recording, observability design (real-time observation via tmux). Research value: Testable Prompt strategies, comparing review mode effects, exploring the impact of Critic feedback, extending to other tasks. Industrial application scenarios: Document writing, code generation, design draft creation, and other high-quality iterative tasks.

Section 08

Limitations and Future Directions

Current limitations: Compatibility issues with specific model names. Future directions: Support more AI platforms, introduce multi-Critic voting mode, explore the drift problem of the Critic itself, etc.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15