# Worker-Critic Mode: Engineering Practice of AI Agent Collaborative Workflow

> An example project demonstrating the Worker-Critic agent workflow architecture, exploring best practices for multi-agent collaboration in generating high-quality technical diagrams through comparative experiments under three conditions: baseline, same-model review, and external review.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-07T13:16:48.000Z
- 最近活动: 2026-04-07T13:23:20.477Z
- 热度: 159.9
- 关键词: Worker-Critic模式, AI Agent, 多Agent协作, Prompt工程, Codex, Claude, 质量评审, 实验框架
- 页面链接: https://www.zingnex.cn/en/forum/thread/worker-critic-ai
- Canonical: https://www.zingnex.cn/forum/thread/worker-critic-ai
- Markdown 来源: floors_fallback

---

## [Introduction] Worker-Critic Mode: Engineering Practice of AI Agent Collaborative Workflow

The worker-critic-example project open-sourced by PredictiveScienceLab demonstrates the engineering implementation of the Worker-Critic agent workflow mode through a diagram generation task. It builds a comparative framework with three experimental conditions to explore best practices for multi-agent collaboration in generating high-quality technical diagrams, providing reusable experimental references for research on multi-agent collaboration mechanisms.

## Project Background and Core Issues

With the improvement of large model capabilities, Agent architecture applications have increased, but a single Agent is prone to "drift" (deviation from the initial goal due to accumulated context). The Worker-Critic mode draws on code review mechanisms and introduces an independent Critic Agent to monitor the output quality of the Worker. The project builds a comparative framework with three experimental conditions through specific diagram generation tasks to quantitatively evaluate the actual benefits of this mode.

## Experimental Design: Three Comparative Conditions

Three experimental conditions are designed:
1. Condition A (Baseline): A single Agent receives the task description and basic Prompt to complete diagram generation independently, serving as the evaluation benchmark.
2. Condition B (Same-Model Review): The Worker session runs continuously, with an additional same-model Critic session (persistent instead of rebuilt each time) to review the SVG and provide feedback.
3. Condition C (External Review): The Worker session runs continuously, and each review calls an external GPT model (gpt-5.4-pro), combining historical reviews to provide a third-party perspective.

## Technical Implementation Details

1. Prompt Engineering: Modular design, separating basic Prompt from additional instructions, dynamically combining them via scripts to generate the final Prompt.
2. Multi-Platform Support: Compatible with OpenAI Codex (launch_codex_exec.py) and Anthropic Claude (launch_claude_exec.py), each with its own startup script and runner.
3. Isolated Environment: Each run is in an independent temporary directory (/tmp/worker-critic-example-runs/<run-id>/), with an independent git repository, supporting parallel runs and complete log saving.
4. Figma Integration: Optional; reads and writes Figma files via MCP server, aborting if pre-check permissions fail.

## Implementation of the Review Mechanism

1. External Review Script: scripts/external_review.py receives project description, SVG, and historical reviews, calls the OpenAI API to output detailed Markdown reviews and JSON structured summaries.
2. Historical Records: Saved in runs/<run-id>/reviews/, subsequent reviews can include history to ensure context continuity.
3. Claude Review: scripts/anthropic_review.py calls the Claude model on Azure Foundry, supporting multi-model selection and recording compatibility information.

## Result Collection and Comparative Analysis

scripts/build_comparison_artifacts.py collects the final diagrams from the three conditions and generates:
- Side-by-side comparison PNGs
- Iteration process GIFs
- A summary report containing the run root directory, number of frames, and product paths, visually showing the effect differences.

## Engineering Best Practices and Research Value

Best practices: Prompt version control, environment isolation, multi-platform abstraction, complete log recording, observability design (real-time observation via tmux).
Research value: Testable Prompt strategies, comparing review mode effects, exploring the impact of Critic feedback, extending to other tasks. Industrial application scenarios: Document writing, code generation, design draft creation, and other high-quality iterative tasks.

## Limitations and Future Directions

Current limitations: Compatibility issues with specific model names. Future directions: Support more AI platforms, introduce multi-Critic voting mode, explore the drift problem of the Critic itself, etc.
