Zing Forum

Reading

Tenchi-MCP: A Hybrid Cloud-Edge LLM Inference Orchestrator Based on the MCP Protocol

Tenchi-MCP is an open-source hybrid inference orchestration tool that seamlessly integrates cloud-based large language models (LLMs) with local Ollama models via the Model Context Protocol (MCP), enabling intelligent task distribution and balancing cost optimization, data privacy protection, and inference efficiency.

MCPLLMOllama混合推理本地模型云端模型Rust隐私保护成本优化Claude Code
Published 2026-05-19 13:40Recent activity 2026-05-19 13:48Estimated read 6 min
Tenchi-MCP: A Hybrid Cloud-Edge LLM Inference Orchestrator Based on the MCP Protocol
1

Section 01

Tenchi-MCP Guide: An Open-Source Solution for Hybrid Cloud-Edge LLM Inference Orchestration

Tenchi-MCP is an open-source hybrid inference orchestration tool based on the MCP protocol. By integrating cloud-based large models (such as Gemini and Claude) with local Ollama models, it enables intelligent task distribution and balances cost optimization, data privacy protection, and inference efficiency. Key advantages include zero-intrusion integration with mainstream AI development tools, flexible multi-model role configuration, and offline support.

2

Section 02

Project Background and Core Contradictions

With the penetration of LLMs into development workflows, developers face a dilemma: cloud-based models are powerful but have high token costs and data privacy risks; local Ollama models are free and data-secure but have inference speed limited by hardware and lack standardized integration interfaces. Tenchi-MCP (Tian Di-MCP), built with Rust, aims to resolve this contradiction by unifying the orchestration of cloud and edge models via the MCP protocol.

3

Section 03

Technical Architecture and Core Mechanisms

MCP Protocol and Zero-Intrusion Integration

MCP is an open protocol launched by Anthropic that standardizes interactions between AI models and external tools. As an MCP server, Tenchi-MCP supports mainstream tools like Claude Code and Gemini CLI, allowing developers to connect to local models without modifying their existing workflows.

Intelligent Task Distribution Strategy

By configuring the roles and task descriptions of local models via models_config.toml, the cloud proxy can independently decide task routing: sensitive code reviews use local Qwen Coder, general Q&A uses cloud models, balancing security and performance.

Multi-Model Role-Based Configuration

Supports defining roles such as Coder (low temperature to ensure determinism), Expert (moderate temperature to balance creativity and accuracy), and Lite (small context window for resource-constrained environments). Each role can independently set system prompts, sampling parameters, and hardware resource allocation.

4

Section 04

Practical Application Scenarios and Value

Privacy-Sensitive Development Scenarios

When handling enterprise private code or sensitive data, local inference data is processed only locally, eliminating leakage risks.

Cost Optimization

Delegating simple tasks (code formatting, syntax checking) to local models can save 30%-60% of cloud costs.

Offline Support

Automatically switches to local models when the network is unstable, ensuring uninterrupted development.

5

Section 05

Installation and Configuration Practice

Installation Methods

  • Gemini CLI: gemini extensions install https://github.com/DovahkiinYuzuko/Tenchi-MCP --ref v0.1.2
  • Claude Code/Codex CLI: Clone the repository and compile: git clone https://github.com/DovahkiinYuzuko/Tenchi-MCP && cd Tenchi-MCP && cargo build --release

Configuration File Structure

models_config.toml includes: global configuration (Ollama address, timeout), model definitions (roles, priorities), inference parameters (temperature, etc.), and resource control (GPU layers, CPU threads).

6

Section 06

Limitations and Notes

  • Local model inference speed depends on hardware: running a 70B parameter model on a consumer-grade CPU may take tens of seconds.
  • Cross-platform verification: The current version is mainly verified on Windows 11; macOS and Linux support have not been tested on real machines yet.
7

Section 07

Summary and Outlook

Tenchi-MCP enables cloud-edge collaboration through intelligent orchestration, providing a practical tool for cost control, privacy protection, and offline availability. As local models (such as Llama3 and Qwen2.5) improve in capability and the MCP ecosystem matures, the hybrid inference model is expected to become mainstream in AI-assisted development. For developers who value data sovereignty and cost, Tenchi-MCP is worth exploring.