# RBG: An LLM Inference Service Orchestration Framework for Kubernetes

> RBG (RoleBasedGroup) is a Kubernetes API specifically designed for orchestrating distributed, stateful AI inference workloads. It supports multi-role collaboration and built-in service discovery, making it particularly suitable for production deployment of decoupled architectures such as Prefill/Decode separation.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-07T06:11:59.000Z
- 最近活动: 2026-04-07T08:10:01.664Z
- 热度: 143.0
- 关键词: Kubernetes, LLM推理, 云原生, AI基础设施, 分布式系统
- 页面链接: https://www.zingnex.cn/en/forum/thread/rbg-kubernetesllm
- Canonical: https://www.zingnex.cn/forum/thread/rbg-kubernetesllm
- Markdown 来源: floors_fallback

---

## RBG: An LLM Inference Service Orchestration Framework for Kubernetes (Introduction)

RBG (RoleBasedGroup) is a Kubernetes API specifically designed for orchestrating distributed, stateful AI inference workloads. It supports multi-role collaboration and built-in service discovery, making it particularly suitable for production deployment of decoupled architectures such as Prefill/Decode separation. Through role-based organizational abstraction, it addresses the limitations of traditional Kubernetes primitives in multi-role topology management, hardware topology sensitivity, and lack of atomic operations, providing a unified orchestration view and efficient collaboration capabilities for LLM inference services.

## Background: Limitations of Traditional Kubernetes Primitives

Modern high-performance LLM inference systems often adopt decoupled architectures (e.g., Prefill/Decode separation), forming complex topologies with multiple roles such as Gateway and Router. However, traditional Kubernetes native resources (StatefulSet, Deployment) face the following challenges:
1. **Difficulty in multi-role topology management**: Need to manage multiple resources separately, lacking a unified orchestration view;
2. **Hardware topology insensitivity**: Hard to fully utilize hardware features like NVLink and PCIe;
3. **Lack of atomic operations**: Cross-role operations such as deployment and upgrade lack coordination, easily leading to service interruptions or state inconsistencies.

## Core Concept of RBG: Role-Based Organizational Abstraction

RBG views inference services as role-based organizations. Its core concepts include:
- **Role**: The basic scheduling unit. Each role (e.g., Prefill, Decode) has independent specifications, lifecycle, and policies, with configurable relationships between roles;
- **RoleBasedGroup**: A set of roles forming a logical service, managed as an integrated unit with topology, statefulness, and collaboration, rather than a collection of isolated resources.

## Five Core Capabilities of RBG (SCOPE)

RBG builds five core capabilities (SCOPE):
1. **Topology-aware deterministic operations**: Precisely control the impact of upgrades/scaling through RoleID injection and the principle of minimal replacement domain;
2. **Cross-role policy engine**: Supports deployment pairing, coordinated upgrades, linked recovery, and coordinated scaling;
3. **Role dependency management**: Defines role dependencies and startup order (e.g., Decode needs to start after Prefill is ready);
4. **Topology self-aware service discovery**: Inject topology information into Pods to eliminate external dependencies;
5. **Topology-aware placement**: Considers hardware affinity (GPU-NVLink > PCIe > RDMA > VPC) and role affinity scheduling.

## Typical Application Scenarios of RBG

RBG is particularly suitable for the following scenarios:
- **Large-scale production deployment**: Manage tens/hundreds of GPU instances and reduce operational complexity;
- **Decoupled architectures**: Support advanced architectures such as Prefill/Decode separation and speculative decoding;
- **Multi-tenant environments**: Clearly partition and isolate resources for different models/user groups;
- **Hybrid cloud deployment**: Optimize traffic routing and failover across availability zones/cloud providers.

## Version Compatibility and Ecosystem

RBG is compatible with the Kubernetes ecosystem:
| RBG Version | Kubernetes Version | LeaderWorkerSet Version |
|-------------|--------------------|-------------------------|
| main        | >=v1.28.x          | >=v0.7.0                |
| v0.4.0      | >=v1.28.x          | >=v0.7.0                |
| v0.3.0      | >=v1.28.x          | >=v0.6.0                |
The project reuses LeaderWorkerSet code, follows Kubernetes community practices, and adopts an open governance model.

## Conclusion and Recommendations

RBG represents a significant advancement in AI inference orchestration on Kubernetes, addressing the core challenges of traditional primitives. As LLM inference scales and architectures become more complex, RBG will become a standard in production environments. It is recommended that teams building or expanding LLM inference infrastructure carefully evaluate and adopt RBG.
