# InferRouter: A Self-Hosted Multi-Provider LLM Inference Proxy for .NET

> InferRouter is a self-hosted LLM inference proxy designed for .NET projects, offering a unified OpenAI-compatible interface, supporting multi-provider failover, rate limit tracking, and structured operation logs to enable seamless model switching and local GGUF fallback.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-26T18:42:17.000Z
- 最近活动: 2026-05-26T18:49:41.776Z
- 热度: 150.9
- 关键词: .NET, LLM proxy, OpenAI compatible, multi-provider, failover, GGUF, LlamaSharp, rate limiting
- 页面链接: https://www.zingnex.cn/en/forum/thread/inferrouter-net-llm
- Canonical: https://www.zingnex.cn/forum/thread/inferrouter-net-llm
- Markdown 来源: floors_fallback

---

## InferRouter: Core Introduction to the Self-Hosted Multi-Provider LLM Inference Proxy for .NET

InferRouter is a self-hosted LLM inference proxy developed and maintained by vvidman, designed specifically for .NET projects. It was released on GitHub on May 26, 2026 (original link: https://github.com/vvidman/InferRouter). Its core features include: providing a unified OpenAI-compatible interface, supporting multi-provider failover, rate limit tracking, structured operation logs, and local GGUF model fallback (based on LlamaSharp), helping developers achieve seamless model switching and high availability.

## Challenges of LLM Multi-Provider Integration and Limitations of Traditional Solutions

With the development of the LLM ecosystem, developers face challenges in flexible switching between multiple providers: a single provider may have service outages, rate limits, or task adaptability issues. Traditional solutions require hard-coding multiple SDKs, manually handling failover, and managing API keys in a decentralized way, leading to high code complexity and difficulty in expansion. InferRouter aims to solve these problems by providing a unified interface and intelligent routing, allowing callers to enjoy multi-provider elasticity without awareness.

## Analysis of Core Architecture and Key Mechanisms

InferRouter adopts a layered architecture, with core components including:
1. **Unified API Layer**: Exposes an OpenAI-compatible `/v1/chat/completions` endpoint externally, supporting seamless migration of all OpenAI clients.
2. **Failover Executor**: Tries providers in the configured order, automatically switching to the next one when encountering recoverable errors (e.g., 429 rate limit).
3. **Rate Limit Tracker**: Maintains local quota counts, supports UTC midnight reset and 60-second sliding window RPM tracking to avoid invalid requests.
4. **Error Normalizer**: Converts errors from different providers into unified categories (RateLimit, AuthError, etc.) to ensure consistent failover logic.
5. **Operation Logs**: Generates structured logs in JSONL format, including information such as request ID, provider, model, token consumption, etc., for easy monitoring and debugging.

## Flexible Configuration and Local GGUF Model Support

The provider chain is defined via configuration files, which can be adjusted without modifying the code. The configuration supports two types: `openai_compatible` (cloud providers compatible with OpenAI interface) and `local_gguf` (local models). The sample configuration includes quota control (daily request limit, per-minute limit) and error mapping rules. Local GGUF models are integrated via LlamaSharp, serving as the final fallback, running in-process, suitable for offline or privacy-sensitive scenarios.

## Security Design and Observability Assurance

**Security**: Uses Docker Secrets to manage API keys, which are mounted as files (`/run/secrets/`), avoiding environment variable leaks, supporting rotation without restarting the service.
**Observability**: Operation logs are in JSONL format, including event types such as `infer_started`, `infer_completed`, `infer_fallback`, etc. They can be integrated with platforms like ELK and Grafana Loki to achieve real-time monitoring, alerting, and cost analysis.

## Deployment Methods and Applicable Scenarios

**Tech Stack**: Based on .NET 10 and ASP.NET Core Minimal API, local inference relies on LlamaSharp 0.20.0.
**Deployment**: Deployed via Docker Compose with concise configuration, supporting key mounting, model directory, and log directory mapping.
**Applicable Scenarios**: High availability requirements (multi-provider redundancy), cost optimization (prioritizing low-cost providers), model diversity (adapting different models for tasks), data privacy compliance (local models avoid data outflow).

## Summary: The Value and Significance of InferRouter

InferRouter promotes the evolution of LLM application architecture from tightly coupled single-provider to flexible, configurable multi-provider proxy, meeting the needs of production environments for security, observability, high availability, and cost-effectiveness. For .NET developers, it provides an out-of-the-box solution, eliminating the need to handle provider API differences or complex failover logic, and serves as an important abstraction layer in the evolution of the LLM ecosystem.
