Zing Forum

Reading

Worker-Critic Mode: Engineering Practice of AI Agent Collaborative Workflow

An example project demonstrating the Worker-Critic agent workflow architecture, exploring best practices for multi-agent collaboration in generating high-quality technical diagrams through comparative experiments under three conditions: baseline, same-model review, and external review.

Worker-Critic模式AI Agent多Agent协作Prompt工程CodexClaude质量评审实验框架
Published 2026-04-07 21:16Recent activity 2026-04-07 21:23Estimated read 6 min
Worker-Critic Mode: Engineering Practice of AI Agent Collaborative Workflow
1

Section 01

[Introduction] Worker-Critic Mode: Engineering Practice of AI Agent Collaborative Workflow

The worker-critic-example project open-sourced by PredictiveScienceLab demonstrates the engineering implementation of the Worker-Critic agent workflow mode through a diagram generation task. It builds a comparative framework with three experimental conditions to explore best practices for multi-agent collaboration in generating high-quality technical diagrams, providing reusable experimental references for research on multi-agent collaboration mechanisms.

2

Section 02

Project Background and Core Issues

With the improvement of large model capabilities, Agent architecture applications have increased, but a single Agent is prone to "drift" (deviation from the initial goal due to accumulated context). The Worker-Critic mode draws on code review mechanisms and introduces an independent Critic Agent to monitor the output quality of the Worker. The project builds a comparative framework with three experimental conditions through specific diagram generation tasks to quantitatively evaluate the actual benefits of this mode.

3

Section 03

Experimental Design: Three Comparative Conditions

Three experimental conditions are designed:

  1. Condition A (Baseline): A single Agent receives the task description and basic Prompt to complete diagram generation independently, serving as the evaluation benchmark.
  2. Condition B (Same-Model Review): The Worker session runs continuously, with an additional same-model Critic session (persistent instead of rebuilt each time) to review the SVG and provide feedback.
  3. Condition C (External Review): The Worker session runs continuously, and each review calls an external GPT model (gpt-5.4-pro), combining historical reviews to provide a third-party perspective.
4

Section 04

Technical Implementation Details

  1. Prompt Engineering: Modular design, separating basic Prompt from additional instructions, dynamically combining them via scripts to generate the final Prompt.
  2. Multi-Platform Support: Compatible with OpenAI Codex (launch_codex_exec.py) and Anthropic Claude (launch_claude_exec.py), each with its own startup script and runner.
  3. Isolated Environment: Each run is in an independent temporary directory (/tmp/worker-critic-example-runs//), with an independent git repository, supporting parallel runs and complete log saving.
  4. Figma Integration: Optional; reads and writes Figma files via MCP server, aborting if pre-check permissions fail.
5

Section 05

Implementation of the Review Mechanism

  1. External Review Script: scripts/external_review.py receives project description, SVG, and historical reviews, calls the OpenAI API to output detailed Markdown reviews and JSON structured summaries.
  2. Historical Records: Saved in runs//reviews/, subsequent reviews can include history to ensure context continuity.
  3. Claude Review: scripts/anthropic_review.py calls the Claude model on Azure Foundry, supporting multi-model selection and recording compatibility information.
6

Section 06

Result Collection and Comparative Analysis

scripts/build_comparison_artifacts.py collects the final diagrams from the three conditions and generates:

  • Side-by-side comparison PNGs
  • Iteration process GIFs
  • A summary report containing the run root directory, number of frames, and product paths, visually showing the effect differences.
7

Section 07

Engineering Best Practices and Research Value

Best practices: Prompt version control, environment isolation, multi-platform abstraction, complete log recording, observability design (real-time observation via tmux). Research value: Testable Prompt strategies, comparing review mode effects, exploring the impact of Critic feedback, extending to other tasks. Industrial application scenarios: Document writing, code generation, design draft creation, and other high-quality iterative tasks.

8

Section 08

Limitations and Future Directions

Current limitations: Compatibility issues with specific model names. Future directions: Support more AI platforms, introduce multi-Critic voting mode, explore the drift problem of the Critic itself, etc.