Zing Forum

Reading

mohdel: Design Philosophy and Practice of a Self-Hosted Multi-Provider LLM Gateway

An LLM gateway focused on inference primitives rather than orchestration, providing a stable, observable unified interface for multiple providers in production environments through process isolation and native OpenTelemetry support.

LLMgatewayOpenTelemetrymulti-providerself-hostedinferencerustobservability
Published 2026-04-29 03:12Recent activity 2026-04-29 03:22Estimated read 6 min
mohdel: Design Philosophy and Practice of a Self-Hosted Multi-Provider LLM Gateway
1

Section 01

mohdel: Design Philosophy and Practice of a Self-Hosted Multi-Provider LLM Gateway

mohdel is a self-hosted multi-provider LLM gateway focused on inference primitives rather than orchestration. It provides a stable, observable unified interface for multiple providers in production environments through process isolation and native OpenTelemetry support. Its core design philosophy is "scope-capping", explicitly avoiding functions like orchestration, retry/degradation, caching, etc., to keep full control with the caller.

2

Section 02

Background: Why Do We Need Another LLM Gateway?

In the current LLM ecosystem, developers face a dilemma: Using each provider's SDK directly incurs the complexity of managing multiple vendors; adopting orchestration frameworks like LangChain introduces abstract layers beyond needs. mohdel takes a middle path—focusing only on inference primitives, not orchestration, to retain control for the caller.

3

Section 03

Project Positioning: Scope-Capping at the Inference Primitive Layer

mohdel's core design philosophy is "scope-capping", explicitly avoiding the following:

  • Not an orchestrator: No chain calls, Agent logic, memory management, etc. (left to implementations like LangChain);
  • Not a retry/degradation engine: Classifies errors but does not automatically retry or switch models;
  • No response caching: Only supports provider-side caching;
  • No context window management: The caller decides the prompt content;
  • Not a SaaS proxy: Fully self-hosted, with API keys and infrastructure controlled by the user.
4

Section 04

Architecture Design: Three-Plane Isolation for Stability

mohdel uses a three-plane architecture to achieve process isolation:

  1. JS Client: Communicates with the backend via Unix Socket, supporting HTTP callers in any language;
  2. Rust Thin-Gate: Scheduler and state owner, responsible for session management and quota control;
  3. JS Session: Actual provider executor, each session runs independently. This design supports running thin-gate as a subprocess for fault isolation, or inline calls within a single process.
5

Section 05

Observability: Native OpenTelemetry Support

Each call in mohdel automatically generates:

  • OpenTelemetry Span: Creates a mohdel.session.answer span, including GenAI semantic attributes (model, token usage, etc.) and mohdel-specific attributes;
  • Trace-linked Logs: stderr logs carry associated information like traceId and spanId;
  • Gate-side OTLP Metrics: Number of active sessions, call statistics, latency distribution, etc. Setting OTEL_EXPORTER_OTLP_ENDPOINT enables automatic reporting of spans and metrics; zero overhead when not set.
6

Section 06

Usage and Integration: Multi-Provider Support and Toolchain

Supported Providers: Currently supports 11 providers including Anthropic, OpenAI, Gemini, etc. Model IDs use the <provider>/<model> format (e.g., gemini/gemini-3-flash-preview). CLI Tool: After installation, interact via the mo command, e.g., mo ask anthropic/claude-sonnet-4-6 "explain monads", cat article.txt | mo ask openai/gpt-5.4 "summarize in 3 bullets", supporting streaming output, effort control, etc. Integration Paths:

  • Client mode (recommended for cross-process): Communication via Unix Socket;
  • Factory mode (quick for single process): Inline calls.
7

Section 07

Practical Significance and Summary

Applicable Scenarios:

  • Need a unified multi-provider interface but don't want to use heavyweight orchestration frameworks;
  • Have strict requirements for observability;
  • Want to keep the architecture simple and clear responsibility division;
  • Need process-level fault isolation in production environments. Summary: mohdel embodies the "less is more" design philosophy. By clarifying capability boundaries, it keeps control with the caller while providing production-level observability and stability. It can serve as the underlying inference primitive for frameworks like LangChain, or as a base layer for self-built orchestration logic.