Reading

mohdel: Design Philosophy and Practice of a Self-Hosted Multi-Provider LLM Gateway

An LLM gateway focused on inference primitives rather than orchestration, providing a stable, observable unified interface for multiple providers in production environments through process isolation and native OpenTelemetry support.

LLMgatewayOpenTelemetrymulti-providerself-hostedinferencerustobservability

Published 2026-04-29 03:12Recent activity 2026-04-29 03:22Estimated read 6 min

Section 01

mohdel: Design Philosophy and Practice of a Self-Hosted Multi-Provider LLM Gateway

mohdel is a self-hosted multi-provider LLM gateway focused on inference primitives rather than orchestration. It provides a stable, observable unified interface for multiple providers in production environments through process isolation and native OpenTelemetry support. Its core design philosophy is "scope-capping", explicitly avoiding functions like orchestration, retry/degradation, caching, etc., to keep full control with the caller.

Section 02

Background: Why Do We Need Another LLM Gateway?

In the current LLM ecosystem, developers face a dilemma: Using each provider's SDK directly incurs the complexity of managing multiple vendors; adopting orchestration frameworks like LangChain introduces abstract layers beyond needs. mohdel takes a middle path—focusing only on inference primitives, not orchestration, to retain control for the caller.

Section 03

Project Positioning: Scope-Capping at the Inference Primitive Layer

mohdel's core design philosophy is "scope-capping", explicitly avoiding the following:

Not an orchestrator: No chain calls, Agent logic, memory management, etc. (left to implementations like LangChain);
Not a retry/degradation engine: Classifies errors but does not automatically retry or switch models;
No response caching: Only supports provider-side caching;
No context window management: The caller decides the prompt content;
Not a SaaS proxy: Fully self-hosted, with API keys and infrastructure controlled by the user.

Section 04

Architecture Design: Three-Plane Isolation for Stability

mohdel uses a three-plane architecture to achieve process isolation:

JS Client: Communicates with the backend via Unix Socket, supporting HTTP callers in any language;
Rust Thin-Gate: Scheduler and state owner, responsible for session management and quota control;
JS Session: Actual provider executor, each session runs independently. This design supports running thin-gate as a subprocess for fault isolation, or inline calls within a single process.

Section 05

Observability: Native OpenTelemetry Support

Each call in mohdel automatically generates:

OpenTelemetry Span: Creates a mohdel.session.answer span, including GenAI semantic attributes (model, token usage, etc.) and mohdel-specific attributes;
Trace-linked Logs: stderr logs carry associated information like traceId and spanId;
Gate-side OTLP Metrics: Number of active sessions, call statistics, latency distribution, etc. Setting OTEL_EXPORTER_OTLP_ENDPOINT enables automatic reporting of spans and metrics; zero overhead when not set.

Section 06

Usage and Integration: Multi-Provider Support and Toolchain

Supported Providers: Currently supports 11 providers including Anthropic, OpenAI, Gemini, etc. Model IDs use the <provider>/<model> format (e.g., gemini/gemini-3-flash-preview). CLI Tool: After installation, interact via the mo command, e.g., mo ask anthropic/claude-sonnet-4-6 "explain monads", cat article.txt | mo ask openai/gpt-5.4 "summarize in 3 bullets", supporting streaming output, effort control, etc. Integration Paths:

Client mode (recommended for cross-process): Communication via Unix Socket;
Factory mode (quick for single process): Inline calls.

Section 07

Practical Significance and Summary

Applicable Scenarios:

Need a unified multi-provider interface but don't want to use heavyweight orchestration frameworks;
Have strict requirements for observability;
Want to keep the architecture simple and clear responsibility division;
Need process-level fault isolation in production environments. Summary: mohdel embodies the "less is more" design philosophy. By clarifying capability boundaries, it keeps control with the caller while providing production-level observability and stability. It can serve as the underlying inference primitive for frameworks like LangChain, or as a base layer for self-built orchestration logic.

mohdel: Design Philosophy and Practice of a Self-Hosted Multi-Provider LLM Gateway

mohdel: Design Philosophy and Practice of a Self-Hosted Multi-Provider LLM Gateway

Background: Why Do We Need Another LLM Gateway?

Project Positioning: Scope-Capping at the Inference Primitive Layer

Architecture Design: Three-Plane Isolation for Stability

Observability: Native OpenTelemetry Support

Usage and Integration: Multi-Provider Support and Toolchain

Practical Significance and Summary

Continue Reading

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

LLM-assisted-analysis: A New Approach to Detecting Logical Vulnerabilities in Smart Contracts Using Large Language Models

Building Modern LLM from Scratch: A Tutorial-level Implementation of Llama-style Language Model