# Tenchi-MCP: A Hybrid Cloud-Edge LLM Inference Orchestrator Based on the MCP Protocol

> Tenchi-MCP is an open-source hybrid inference orchestration tool that seamlessly integrates cloud-based large language models (LLMs) with local Ollama models via the Model Context Protocol (MCP), enabling intelligent task distribution and balancing cost optimization, data privacy protection, and inference efficiency.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-19T05:40:49.000Z
- 最近活动: 2026-05-19T05:48:09.977Z
- 热度: 156.9
- 关键词: MCP, LLM, Ollama, 混合推理, 本地模型, 云端模型, Rust, 隐私保护, 成本优化, Claude Code, Gemini CLI
- 页面链接: https://www.zingnex.cn/en/forum/thread/tenchi-mcp-mcpllm
- Canonical: https://www.zingnex.cn/forum/thread/tenchi-mcp-mcpllm
- Markdown 来源: floors_fallback

---

## Tenchi-MCP Guide: An Open-Source Solution for Hybrid Cloud-Edge LLM Inference Orchestration

Tenchi-MCP is an open-source hybrid inference orchestration tool based on the MCP protocol. By integrating cloud-based large models (such as Gemini and Claude) with local Ollama models, it enables intelligent task distribution and balances cost optimization, data privacy protection, and inference efficiency. Key advantages include zero-intrusion integration with mainstream AI development tools, flexible multi-model role configuration, and offline support.

## Project Background and Core Contradictions

With the penetration of LLMs into development workflows, developers face a dilemma: cloud-based models are powerful but have high token costs and data privacy risks; local Ollama models are free and data-secure but have inference speed limited by hardware and lack standardized integration interfaces. Tenchi-MCP (Tian Di-MCP), built with Rust, aims to resolve this contradiction by unifying the orchestration of cloud and edge models via the MCP protocol.

## Technical Architecture and Core Mechanisms

### MCP Protocol and Zero-Intrusion Integration
MCP is an open protocol launched by Anthropic that standardizes interactions between AI models and external tools. As an MCP server, Tenchi-MCP supports mainstream tools like Claude Code and Gemini CLI, allowing developers to connect to local models without modifying their existing workflows.
### Intelligent Task Distribution Strategy
By configuring the roles and task descriptions of local models via `models_config.toml`, the cloud proxy can independently decide task routing: sensitive code reviews use local Qwen Coder, general Q&A uses cloud models, balancing security and performance.
### Multi-Model Role-Based Configuration
Supports defining roles such as Coder (low temperature to ensure determinism), Expert (moderate temperature to balance creativity and accuracy), and Lite (small context window for resource-constrained environments). Each role can independently set system prompts, sampling parameters, and hardware resource allocation.

## Practical Application Scenarios and Value

### Privacy-Sensitive Development Scenarios
When handling enterprise private code or sensitive data, local inference data is processed only locally, eliminating leakage risks.
### Cost Optimization
Delegating simple tasks (code formatting, syntax checking) to local models can save 30%-60% of cloud costs.
### Offline Support
Automatically switches to local models when the network is unstable, ensuring uninterrupted development.

## Installation and Configuration Practice

### Installation Methods
- **Gemini CLI**: `gemini extensions install https://github.com/DovahkiinYuzuko/Tenchi-MCP --ref v0.1.2`
- **Claude Code/Codex CLI**: Clone the repository and compile: `git clone https://github.com/DovahkiinYuzuko/Tenchi-MCP && cd Tenchi-MCP && cargo build --release`
### Configuration File Structure
`models_config.toml` includes: global configuration (Ollama address, timeout), model definitions (roles, priorities), inference parameters (temperature, etc.), and resource control (GPU layers, CPU threads).

## Limitations and Notes

- Local model inference speed depends on hardware: running a 70B parameter model on a consumer-grade CPU may take tens of seconds.
- Cross-platform verification: The current version is mainly verified on Windows 11; macOS and Linux support have not been tested on real machines yet.

## Summary and Outlook

Tenchi-MCP enables cloud-edge collaboration through intelligent orchestration, providing a practical tool for cost control, privacy protection, and offline availability. As local models (such as Llama3 and Qwen2.5) improve in capability and the MCP ecosystem matures, the hybrid inference model is expected to become mainstream in AI-assisted development. For developers who value data sovereignty and cost, Tenchi-MCP is worth exploring.
