# mohdel: Design Philosophy and Practice of a Self-Hosted Multi-Provider LLM Gateway

> An LLM gateway focused on inference primitives rather than orchestration, providing a stable, observable unified interface for multiple providers in production environments through process isolation and native OpenTelemetry support.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-28T19:12:49.000Z
- 最近活动: 2026-04-28T19:22:04.603Z
- 热度: 150.8
- 关键词: LLM, gateway, OpenTelemetry, multi-provider, self-hosted, inference, rust, observability
- 页面链接: https://www.zingnex.cn/en/forum/thread/mohdel-provider-llm
- Canonical: https://www.zingnex.cn/forum/thread/mohdel-provider-llm
- Markdown 来源: floors_fallback

---

## mohdel: Design Philosophy and Practice of a Self-Hosted Multi-Provider LLM Gateway

mohdel is a self-hosted multi-provider LLM gateway focused on inference primitives rather than orchestration. It provides a stable, observable unified interface for multiple providers in production environments through process isolation and native OpenTelemetry support. Its core design philosophy is "scope-capping", explicitly avoiding functions like orchestration, retry/degradation, caching, etc., to keep full control with the caller.

## Background: Why Do We Need Another LLM Gateway?

In the current LLM ecosystem, developers face a dilemma: Using each provider's SDK directly incurs the complexity of managing multiple vendors; adopting orchestration frameworks like LangChain introduces abstract layers beyond needs. mohdel takes a middle path—focusing only on inference primitives, not orchestration, to retain control for the caller.

## Project Positioning: Scope-Capping at the Inference Primitive Layer

mohdel's core design philosophy is "scope-capping", explicitly avoiding the following:
- Not an orchestrator: No chain calls, Agent logic, memory management, etc. (left to implementations like LangChain);
- Not a retry/degradation engine: Classifies errors but does not automatically retry or switch models;
- No response caching: Only supports provider-side caching;
- No context window management: The caller decides the prompt content;
- Not a SaaS proxy: Fully self-hosted, with API keys and infrastructure controlled by the user.

## Architecture Design: Three-Plane Isolation for Stability

mohdel uses a three-plane architecture to achieve process isolation:
1. JS Client: Communicates with the backend via Unix Socket, supporting HTTP callers in any language;
2. Rust Thin-Gate: Scheduler and state owner, responsible for session management and quota control;
3. JS Session: Actual provider executor, each session runs independently.
This design supports running thin-gate as a subprocess for fault isolation, or inline calls within a single process.

## Observability: Native OpenTelemetry Support

Each call in mohdel automatically generates:
- OpenTelemetry Span: Creates a `mohdel.session.answer` span, including GenAI semantic attributes (model, token usage, etc.) and mohdel-specific attributes;
- Trace-linked Logs: stderr logs carry associated information like traceId and spanId;
- Gate-side OTLP Metrics: Number of active sessions, call statistics, latency distribution, etc.
Setting `OTEL_EXPORTER_OTLP_ENDPOINT` enables automatic reporting of spans and metrics; zero overhead when not set.

## Usage and Integration: Multi-Provider Support and Toolchain

**Supported Providers**: Currently supports 11 providers including Anthropic, OpenAI, Gemini, etc. Model IDs use the `<provider>/<model>` format (e.g., gemini/gemini-3-flash-preview).
**CLI Tool**: After installation, interact via the `mo` command, e.g., `mo ask anthropic/claude-sonnet-4-6 "explain monads"`, `cat article.txt | mo ask openai/gpt-5.4 "summarize in 3 bullets"`, supporting streaming output, effort control, etc.
**Integration Paths**:
- Client mode (recommended for cross-process): Communication via Unix Socket;
- Factory mode (quick for single process): Inline calls.

## Practical Significance and Summary

**Applicable Scenarios**:
- Need a unified multi-provider interface but don't want to use heavyweight orchestration frameworks;
- Have strict requirements for observability;
- Want to keep the architecture simple and clear responsibility division;
- Need process-level fault isolation in production environments.
**Summary**: mohdel embodies the "less is more" design philosophy. By clarifying capability boundaries, it keeps control with the caller while providing production-level observability and stability. It can serve as the underlying inference primitive for frameworks like LangChain, or as a base layer for self-built orchestration logic.