# Miramar Platform: Engineering Practice and Architecture Design of a Hybrid Cloud AI Platform

> This article introduces the Miramar Platform project, a hybrid AI platform that combines local DGX workstations with GCP cloud resources. The project demonstrates how to build reproducible MLOps workflows using Terraform, GKE, Workload Identity Federation, and self-hosted GPU runners.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-08T16:16:10.000Z
- 最近活动: 2026-06-08T16:20:51.117Z
- 热度: 154.9
- 关键词: 混合云AI平台, DGX Spark, GCP, GKE, MLOps, Terraform, Workload Identity Federation, GitHub Actions, 自托管Runner, Kubeflow
- 页面链接: https://www.zingnex.cn/en/forum/thread/miramar-platform-ai
- Canonical: https://www.zingnex.cn/forum/thread/miramar-platform-ai
- Markdown 来源: floors_fallback

---

## Miramar Platform: Hybrid Cloud AI Platform Core Overview

Miramar Platform is a hybrid AI platform developed by miramar-labs-org (source: GitHub repo https://github.com/miramar-labs-org/miramar-platform-gcp, updated 2026-06-08). It integrates local NVIDIA DGX Spark/Jetson AGX Orin devices with Google Cloud Platform (GCP) resources to solve infrastructure dilemmas in AI development. Key technologies include Terraform, GKE, Workload Identity Federation, self-hosted GPU runners, and MLOps workflows, aiming to build reproducible AI pipelines.

## Project Background & Core Vision

AI teams face two main infrastructure issues: full cloud dependency leads to high GPU costs and data privacy risks; full local deployment lacks elastic scalability. Miramar Platform's core vision is to combine local and cloud resources: sensitive data/model training is done locally, while inference and collaboration are handled on the cloud. This approach balances data privacy and cloud convenience.

## Local Hardware Architecture

The platform uses three heterogeneous local machines:
1. WSL2 Workstation: Ubuntu 22.04 on Windows laptop, RTX4060 (8GB), acts as GitHub Actions self-hosted runner for lightweight CI/CD tasks.
2. Jetson AGX Orin: 64GB unified memory, 2048 CUDA cores, Ubuntu22.04 with JetPack6.x (arm64), suitable for edge AI inference and lightweight training.
3. DGX Spark: 128GB unified memory, GB10 Superchip (6144 CUDA cores,192 Tensor Cores), handles large model fine-tuning and complex training.
All machines use the same mlabs-runner Docker image (multi-arch: amd64/arm64) to simplify operations.

## Local AI Software Stack

DGX Spark runs a complete local AI software stack:
- Minikube: Local Kubernetes for container orchestration.
- NeMo Microservices: NVIDIA's framework for large model training/fine-tuning/inference.
- MLflow & MinIO: Experiment tracking/model version management with S3-compatible storage.
- Qdrant: Vector database for RAG semantic search.
- Kubeflow Pipelines: Orchestration for complex ML workflows.
- Ollama & NIM: Local inference services (Ollama for consumer models, NIM for enterprise NVIDIA-optimized models).
These services are exposed via SSH tunnels to development workstations for a cloud-like local experience.

## Cloud Architecture (GCP)

The cloud part uses Terraform for infrastructure-as-code (IaC) to ensure reproducibility:
- GKE Standard Cluster (miramar-shared-gke): Shared Kubernetes layer for various workloads.
- Artifact Registry (apps repo): Stores application images.
- GCS Buckets: Persist Terraform state and GKE node pool snapshots.
- Workload Identity Federation: Enables keyless authentication from GitHub Actions to GCP, enhancing security by avoiding long-term service account keys.

## CI/CD & Project Factory

CI/CD is powered by GitHub Actions:
- Self-hosted runners: All three local machines are registered as runners (tagged wsl2, dgx, agx) to route tasks needing GPU/local access/arm64 to appropriate machines.
- Workflow matrix: Covers platform lifecycle (create/destroy/expand/recover), GPU quota management, local AI service deployment, runner image building, WSL2 config, etc. Each workflow has a corresponding destroy/uninstall workflow.
Project Factory: Template-based projects auto-get notebook-first dev env (JupyterLab), pre-configured CI/CD, platform integration, local execution config, and standard docs. First template: Kubeflow Pipelines fine-tuning project (local fine-tuning for PHI data, then desensitized model to GCP for inference).

## Engineering Highlights & Applicable Scenarios

Key engineering practices:
1. Keyless authentication (Workload Identity Federation) reduces credential risks.
2. Unified multi-arch container image simplifies CI/CD and operations.
3. Full lifecycle management avoids orphaned cloud resources.
4. Docs-as-code ensures knowledge transfer.
Applicable scenarios:
- AI teams handling sensitive data (medical/finance).
- Organizations wanting to reduce cloud GPU costs.
- Projects needing edge AI capabilities.
- Teams adopting IaC practices.
Conclusion: The hybrid model balances data sovereignty and cloud elasticity, potentially becoming a standard for future AI infrastructure.
