Zing Forum

Reading

Hybrid Agent Workflow: Collaborative Practice of SLM and LLM Under Microsoft Agent Framework

This project demonstrates how to build a hybrid agent workflow using the Microsoft Agent Framework, achieving complementary advantages between local small models (SLMs) and cloud-based large models (LLMs) through five collaboration modes, and striking a balance between latency, privacy, and cost.

微软Agent框架混合智能体SLMLLM本地推理云端大模型任务分解AI架构成本优化
Published 2026-05-21 17:40Recent activity 2026-05-21 17:50Estimated read 6 min
Hybrid Agent Workflow: Collaborative Practice of SLM and LLM Under Microsoft Agent Framework
1

Section 01

Hybrid Agent Workflow: Collaborative Practice of SLM and LLM Under Microsoft Agent Framework (Introduction)

This article introduces the hybrid agent workflow project under the Microsoft Agent Framework, aiming to solve the dilemmas enterprises face when deploying AI: high cost, large latency, and privacy risks of cloud LLMs, as well as the limited capabilities of local SLMs. The project achieves complementary advantages between SLMs and LLMs through 5 collaboration modes, balancing latency, privacy, and cost.

2

Section 02

Project Background and Core Concepts

When enterprises deploy AI, there is a contradiction between cloud LLMs (strong capabilities but high cost, large latency, and privacy risks) and local SLMs (lightweight and efficient but limited performance in complex tasks). Author Filip W observed that developers often ignore the value of edge computing—many simple queries do not require GPT-4-level capabilities. The project is based on the Microsoft Agent Framework (cross-Python/.NET), with the core concept of "intelligent routing and layered processing": simple tasks are handled by SLMs, complex tasks are escalated to LLMs, dynamically balancing performance, cost, and privacy.

3

Section 03

Detailed Explanation of Collaboration Modes (1): SLM Default with Fallback and Predictive Routing

The project implements 5 academically validated collaboration modes:

  1. SLM Default with LLM Fallback: First handled by a local SLM (e.g., Phi-4-mini-instruct), if the result confidence is insufficient, it is escalated to a cloud LLM. Suitable for high-frequency, low-complexity scenarios (refer to arXiv:2510.03847).
  2. Predictive Routing: A lightweight router model classifies tasks as weak/strong and directly routes them to SLM/LLM, avoiding fallback waste. Suitable for scenarios with distinct task types (refer to arXiv:2406.18665).
4

Section 04

Detailed Explanation of Collaboration Modes (2): MAKER, MINIONS, and Agent Chain

The remaining 3 modes: 3. MAKER Protocol: Complex tasks are decomposed into atomic subtasks by a cloud LLM, then executed in parallel by a local SLM cluster and converged via voting. Suitable for multi-step reasoning tasks (refer to arXiv:2511.09030). 4. MINIONS Protocol: Long documents are split into fragments, local models extract information in parallel, and a cloud LLM summarizes. This protects privacy and is efficient (refer to arXiv:2502.15964). 5. Agent Chain: Local SLMs are connected in series to process documents sequentially, accumulating context before being synthesized by an LLM. Suitable for progressive reasoning (refer to arXiv:2406.02818).

5

Section 05

Technical Implementation and Multi-Platform Support

The project provides Python and .NET implementations:

  • Python: Supports MLX (optimized for Apple Silicon) and Foundry Local (cross-platform) backends, switchable via environment variables.
  • .NET: Supports Ollama (local), OpenAI-compatible interfaces, and Azure AI Foundry (cloud). SLM/LLM backends can be configured independently. For configuration: Short model aliases simplify cross-platform settings, and sensitive information is managed via environment variables to avoid leakage.
6

Section 06

Practical Application Value and Best Practices

Project Value:

  • For Developers: Provides directly applicable architecture templates (e.g., enterprise knowledge bases combining predictive routing and MINIONS).
  • For Architects: Demonstrates the transformation of academic results into engineering practice, with each mode annotated with papers for in-depth understanding.
  • For Product Managers: Provides cost-performance trade-off cases, with quantitative evaluation of latency, cost, and accuracy to support selection.
7

Section 07

Future Outlook and Community Contributions

The project will continue to follow the latest version of the Microsoft Agent Framework (currently based on RC4). The community can contribute new modes or improve implementations by submitting Issues/PRs on GitHub. With the development of edge AI chips and the improvement of SLM capabilities, hybrid agent architecture will become the mainstream of enterprise AI applications, and this project provides a pioneering practice for this trend.