# GenMLX: Building Large Model Inference Clusters with Multiple Apple Silicon Macs

> GenMLX is an open-source project that connects multiple Apple Silicon Macs (M-series) via Thunderbolt 5 network to form a tensor parallel inference cluster for running large-parameter language models. It supports Web UI management, OpenAI-compatible API, L2 disk cache, and heterogeneous memory configuration, with deployment achievable in 15 minutes.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-04T11:15:41.000Z
- 最近活动: 2026-06-04T11:18:44.650Z
- 热度: 163.9
- 关键词: Apple Silicon, MLX, 大语言模型, 分布式推理, Thunderbolt 5, 张量并行, 本地部署, 机器学习, Mac, 开源
- 页面链接: https://www.zingnex.cn/en/forum/thread/genmlx-apple-silicon-mac
- Canonical: https://www.zingnex.cn/forum/thread/genmlx-apple-silicon-mac
- Markdown 来源: floors_fallback

---

## GenMLX: Open-Source Project for Apple Silicon Macs to Build Large Model Inference Clusters

GenMLX is an open-source project that connects multiple Apple Silicon Macs (M-series) via Thunderbolt 5 to form a tensor parallel inference cluster for running large parameter language models. Key features include Web UI management, OpenAI-compatible API, L2 disk cache, heterogeneous memory configuration, and deployment in 15 minutes. It addresses the memory bottleneck of single Macs for large models.

## Background & Problem Solved

Traditional single-machine inference on Apple Silicon Macs is limited by unified memory capacity, making it hard to run models over 100B parameters. GenMLX, built on Apple's MLX framework, uses Thunderbolt5's high-speed network to create a distributed cluster, breaking this limit and allowing integration of multiple Mac devices (M1 Max, M3 Ultra, etc.) into a unified inference engine.

## Core Architecture & Technical Principles

**Control Plane (Master-Agent):** Master node manages Web UI, REST API, registry, and task scheduling; Agent runs on each worker node, responding to Master commands with HTTP + Bearer Token (no SSH keys needed).

**Data Plane (Dispatcher):** FastAPI-based core service wrapping mlx-lm, supporting continuous batching and L2 cache, using mx.distributed for node communication over Thunderbolt5.

**Network Flexibility:** Supports TB5 RDMA (best performance), TB4/3 RDMA, and 10/1 GbE as backup; mesh setup wizard auto-generates IP plans for 1-6 nodes (full mesh/ring topology).

## Key Functional Features

**Heterogeneous Memory Support:** Automatically chooses tensor parallel (homogeneous) or pipeline parallel (heterogeneous) for mixed Mac configs (e.g., 192GB Mac Studio +32GB Mac mini +96GB MacBook Pro).

**L2 Disk Cache:** 200GB+ SSD cache for KV state, reduces cold start prefill from 88 mins to 37 secs, saves snapshots at system prompt boundaries for reuse.

**API Compatibility:** OpenAI-compatible API (/v1/chat/completions etc.), native Anthropic API adapter, tool/function call support, thinking token routing.

## Deployment & Usage Experience

**Quick Installation:** Master node via `curl | bash --master` (installs Python3.11+uv+macmon, sets venv, generates token, launchd service, opens UI at localhost:6789). Worker node via `curl | bash --agent` with master URL and token (registers in 30 secs).

**Web UI:** Manages model lifecycle (download/sync/serve), checks model presence across nodes, real-time telemetry (CPU/GPU/RAM/SSD), config panel for tools like Claude Code.

## Performance & Limitations

**Current State:** Pre-alpha (v0.1.0.dev0, phase 0 of 7).

**Hardware Reqs:** Apple Silicon Macs, Thunderbolt5 recommended, 1-6 nodes.

**Differences from Similar Projects:** Focuses on fixed, owned topology (1-6 Macs on private network); EXO Labs is better for elastic/dynamic device discovery across mobile/desktop.

## Practical Significance & Application Scenarios

GenMLX solves scenarios like: 1. Privacy-first local inference (no API keys, data stays local). 2. Hardware asset reuse (integrate existing Macs).3. Local deployment of large models (DeepSeek V4, Qwen3-Coder-Next, GLM-4.7 etc.).4. Integration with tools (Claude Code, Cline, OpenWebUI).

## Conclusion & Future Outlook

GenMLX is an important attempt at distributed AI inference in the Apple Silicon ecosystem. It leverages Thunderbolt5 and MLX framework to enable local large model runs. Its architecture (control/data plane separation, heterogeneous support, API compatibility) caters to real deployment needs. As it matures (target v1.0.0), it's expected to become a top choice for Apple Silicon users to deploy local large models.