# From Intent to Execution: A Multi-Agent Workflow Auto-Orchestration Framework Based on Agent Recommendation

> This article introduces an automated multi-agent system construction framework. Through an LLM-driven planner, dynamic call graph, and two-stage agent recommendation system, it transforms manual workflow orchestration into an automated process, significantly improving the recall rate of agent selection and system robustness.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-05T17:08:26.000Z
- 最近活动: 2026-05-06T03:52:04.463Z
- 热度: 129.3
- 关键词: 多智能体系统, 智能体推荐, 工作流编排, LLM规划, 信息检索, 自动化框架, 任务分解
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2605-03986v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2605-03986v1
- Markdown 来源: floors_fallback

---

## Introduction: Multi-Agent Workflow Auto-Orchestration Framework Based on Agent Recommendation

This article proposes an automated multi-agent system construction framework to address the pain points of manual construction of current multi-agent systems (MAS), including manual planning, complex agent selection, and tedious execution graph construction. Through core modules such as an LLM-driven planner, dynamic call graph, and two-stage agent recommendation system, the framework transforms manual workflow orchestration into an automated process, significantly improving the recall rate of agent selection and system robustness, and promoting a key shift of MAS from manual crafting to automated assembly.

## Three Major Dilemmas in Multi-Agent System Construction

Current multi-agent system development faces three challenges:
1. **Manual Planning**: Requires manual design of execution plans, prediction of step inputs/outputs, and handling of edge cases—time-consuming, error-prone, and difficult to adapt to demand changes;
2. **Complex Agent Selection**: Rapid growth in the number of agents, each with capability boundaries, performance characteristics, and costs, leading to heavy manual evaluation and selection burdens;
3. **Tedious Execution Graph Construction**: Assembling call graphs requires a large amount of boilerplate code (parameter mapping, error handling, etc.), occupying significant development time.

## Analysis of Five Core Modules of the Automated Framework

The framework includes five closely collaborative modules:
1. **LLM-Driven Planner**: Receives natural language intent, outputs a set of structured task descriptions, and dynamically generates task decomposition and input/output specifications adaptively;
2. **Natural Language Task Description**: Expresses task goals, constraints, etc., with rich semantics, supporting semantic similarity matching;
3. **Dynamic Call Graph**: Explicitly represents task dependencies and can adjust execution paths (e.g., branches, loops) based on runtime conditions;
4. **Agent Orchestrator**: Maintains an agent registry and maps tasks to agents based on factors such as capability matching degree and historical success rate;
5. **Two-Stage Agent Recommendation System**: Fast retrieval (embedding model vector matching to filter candidates) + LLM reordering (fine-grained semantic matching to rank), balancing efficiency and effectiveness.

## Experimental Exploration: Component Optimization and Key Findings

The research team optimized the framework components through experiments:
1. **Embedding Model Selection**: Domain-specific embedding models (e.g., code/tool description fine-tuned models) outperform general models;
2. **Reordering Strategy**: LLM-generated reasoning (Chain-of-Thought) improves accuracy and interpretability, while introducing negative samples enhances discrimination ability;
3. **Agent Description Enhancement**: Analyzing historical logs to extract successful/failed scenarios to enrich descriptions, improving matching recall rate;
4. **Critic Agent**: Reviews recommendation results from dimensions of task coverage, redundancy, risk diversification, and cost-effectiveness to further improve recall rate.

## End-to-End Test Results: Validation of Recall Rate and Robustness

End-to-end benchmark tests cover scenarios such as data analysis and code generation, with results showing:
1. **Significant Recall Rate Improvement**: Compared to existing methods, the recall rate of agent selection is significantly higher;
2. **Robustness and Scalability**: Response time remains sub-second as the number of agents grows, and selection quality is stable under task description variations/noise, making it suitable for production deployment.

## Industry Implications and Future Outlook

**Industry Implications**:
1. Need to establish standardized specifications for agent capability descriptions;
2. Developer role shifts from "writing call code" to "defining task intent";
3. New human-machine collaboration model: The framework handles matching and orchestration, while humans focus on intent clarification and quality control.

**Future Outlook**: The auto-orchestration framework will become a key component of AI infrastructure, and design ideas such as two-stage recommendation and dynamic call graph may become industry standard practices.