Section 01
InferHub: Introduction to the .NET-Based Self-Hosted LLM Inference Grid System
InferHub is a self-hosted LLM inference grid system developed by Dev-Art-Solutions, built on .NET. It decouples the Ollama-compatible API gateway from the GPU worker node pool to enable distributed inference deployment. Its core purpose is to solve the problem of tight coupling between inference services and GPU resources in traditional LLM deployments, offering advantages such as flexible resource reuse and cost optimization, and supporting self-hosted and hybrid deployment scenarios.