Reading

Hybrid Agent Workflow: Collaborative Practice of SLM and LLM Under Microsoft Agent Framework

This project demonstrates how to build a hybrid agent workflow using the Microsoft Agent Framework, achieving complementary advantages between local small models (SLMs) and cloud-based large models (LLMs) through five collaboration modes, and striking a balance between latency, privacy, and cost.

微软Agent框架混合智能体SLMLLM本地推理云端大模型任务分解AI架构成本优化

Published 2026-05-21 17:40Recent activity 2026-05-21 17:50Estimated read 6 min

Section 01

Hybrid Agent Workflow: Collaborative Practice of SLM and LLM Under Microsoft Agent Framework (Introduction)

This article introduces the hybrid agent workflow project under the Microsoft Agent Framework, aiming to solve the dilemmas enterprises face when deploying AI: high cost, large latency, and privacy risks of cloud LLMs, as well as the limited capabilities of local SLMs. The project achieves complementary advantages between SLMs and LLMs through 5 collaboration modes, balancing latency, privacy, and cost.

Section 02

Project Background and Core Concepts

When enterprises deploy AI, there is a contradiction between cloud LLMs (strong capabilities but high cost, large latency, and privacy risks) and local SLMs (lightweight and efficient but limited performance in complex tasks). Author Filip W observed that developers often ignore the value of edge computing—many simple queries do not require GPT-4-level capabilities. The project is based on the Microsoft Agent Framework (cross-Python/.NET), with the core concept of "intelligent routing and layered processing": simple tasks are handled by SLMs, complex tasks are escalated to LLMs, dynamically balancing performance, cost, and privacy.

Section 03

Detailed Explanation of Collaboration Modes (1): SLM Default with Fallback and Predictive Routing

The project implements 5 academically validated collaboration modes:

SLM Default with LLM Fallback: First handled by a local SLM (e.g., Phi-4-mini-instruct), if the result confidence is insufficient, it is escalated to a cloud LLM. Suitable for high-frequency, low-complexity scenarios (refer to arXiv:2510.03847).
Predictive Routing: A lightweight router model classifies tasks as weak/strong and directly routes them to SLM/LLM, avoiding fallback waste. Suitable for scenarios with distinct task types (refer to arXiv:2406.18665).

Section 04

Detailed Explanation of Collaboration Modes (2): MAKER, MINIONS, and Agent Chain

The remaining 3 modes: 3. MAKER Protocol: Complex tasks are decomposed into atomic subtasks by a cloud LLM, then executed in parallel by a local SLM cluster and converged via voting. Suitable for multi-step reasoning tasks (refer to arXiv:2511.09030). 4. MINIONS Protocol: Long documents are split into fragments, local models extract information in parallel, and a cloud LLM summarizes. This protects privacy and is efficient (refer to arXiv:2502.15964). 5. Agent Chain: Local SLMs are connected in series to process documents sequentially, accumulating context before being synthesized by an LLM. Suitable for progressive reasoning (refer to arXiv:2406.02818).

Section 05

Technical Implementation and Multi-Platform Support

The project provides Python and .NET implementations:

Python: Supports MLX (optimized for Apple Silicon) and Foundry Local (cross-platform) backends, switchable via environment variables.
.NET: Supports Ollama (local), OpenAI-compatible interfaces, and Azure AI Foundry (cloud). SLM/LLM backends can be configured independently. For configuration: Short model aliases simplify cross-platform settings, and sensitive information is managed via environment variables to avoid leakage.

Section 06

Practical Application Value and Best Practices

Project Value:

For Developers: Provides directly applicable architecture templates (e.g., enterprise knowledge bases combining predictive routing and MINIONS).
For Architects: Demonstrates the transformation of academic results into engineering practice, with each mode annotated with papers for in-depth understanding.
For Product Managers: Provides cost-performance trade-off cases, with quantitative evaluation of latency, cost, and accuracy to support selection.

Section 07

Future Outlook and Community Contributions

The project will continue to follow the latest version of the Microsoft Agent Framework (currently based on RC4). The community can contribute new modes or improve implementations by submitting Issues/PRs on GitHub. With the development of edge AI chips and the improvement of SLM capabilities, hybrid agent architecture will become the mainstream of enterprise AI applications, and this project provides a pioneering practice for this trend.

Continue Reading

Keep going with more reads from the same topic.

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

SignalCut is an innovative web application that analyzes brands' visibility gaps in AI search, automatically generates evidence-based marketing strategies, and creates Hera video materials, helping early-stage brands gain a competitive edge in the AI answer engine era.

Recent activity 2026-04-26 11:27

AWS Open-Sources AI Search Citation Analysis System: Track Brand Exposure in AI Search Engines

An open-source project officially released by AWS, built on Amazon Bedrock, Step Functions, and React to form a complete serverless citation analysis system. It helps enterprises monitor their brand's citation status and competitive landscape in AI searches like ChatGPT, Perplexity, Gemini, and Claude.

Recent activity 2026-03-31 20:49

Next.js Application SEO and GEO Integrated Optimization Solution: Comprehensive Visibility from Search Engines to AI Assistants

This article delves into the stevewerme/seo-geo-nextjs project, an open-source tool designed specifically for Next.js applications to simultaneously optimize traditional search engine rankings (SEO) and generative engine visibility (GEO). It analyzes the project's core architecture, implementation mechanisms, practical application scenarios, and its strategic significance for developers and content creators.

Recent activity 2026-04-03 14:48

Baiyuan GEO Platform Technical White Paper: SaaS Engineering Practice for Generative Engine Optimization (GEO)

This article deeply analyzes the GEO Platform technical white paper developed by Baiyuan Technology, covering the seven-dimensional AI citation rate scoring algorithm, AXP shadow document delivery mechanism, Schema.org three-layer entity knowledge graph, and the hallucination automatic detection and repair closed-loop system, providing an engineering solution for brands to gain visibility in generative AI such as ChatGPT and Claude.

Recent activity 2026-04-18 22:54