正文

Cadence：基于 MPSGraph 的 Apple Silicon 原生 LLM 推理实验框架

Cadence 是一个使用 Swift、SwiftUI 和 Metal Performance Shaders Graph 构建的 macOS 本地 LLM 推理实验项目，专注于验证 Transformer 核心算子的 GPU 实现。

SwiftMPSGraphApple SiliconMetalTransformer本地推理LLM端侧 AI

发布时间 2026/04/26 11:37最近活动 2026/04/26 11:50预计阅读 5 分钟

Cadence：基于 MPSGraph 的 Apple Silicon 原生 LLM 推理实验框架

章节 01

Cadence: Experimental LLM Inference Framework for Apple Silicon Using MPSGraph

Cadence is an experimental project by Ostinato Labs, built with Swift, SwiftUI, and Metal Performance Shaders Graph (MPSGraph) for macOS native LLM inference. It focuses on verifying core Transformer operators on GPU, currently in early R&D phase (not a ready chat app). It serves as an operator testbed, CPU-GPU validation tool, and prototype for future local inference engines.

章节 02

Background & Project Positioning

With Apple Silicon's growing power, leveraging Metal for LLM inference is a key focus for developers. Cadence is an early-stage R&D project, not a production-ready app. Its roles include:

Metal/MPSGraph operator experiment field;
CPU-GPU output comparison validation;
Skeleton prototype for future local inference engines. It is not a complete chat app, Qwen runtime, or mature tested project.

章节 03

Technical Architecture & Core Transformer Operators

Cadence uses Apple native tech stack: Swift5 (language), SwiftUI (UI), MPSGraph (GPU acceleration). Key components:

Device management (MTLDevice, command queue, MPSGraphDevice) in Device.swift;
Tensor utils (data conversion) in TensorUtils.swift; Implemented Transformer operators:
Attention: single-head, multi-head (with causal mask), GQA;
RoPE: precompute cos/sin tables and apply;
Normalization: RMSNorm (with debug values), LayerNorm;
Activation: SWiGLU; Tokenizer: ByteShadowMap (byte-level reversible encoding, foundation for BPE).

章节 04

Validation & Testing Methods

Cadence uses manual test runners (not XCTest) compiled into the app, invoked via CadenceApp.init(). Current tests:

MatmulTest: CPU vs GPU consistency;
RMSNormTest, RoPETest (numeric/property), LayerNormTest, SWiGLUTest;
AttentionTest (single/multi/GQA), AttentionPerfTest (CPU-GPU performance);
ByteShadowMapTest (round-trip encoding). This approach offers flexibility in early R&D.

章节 05

Current Limitations & Model Assets

Model assets: Qwen3.5-4B files (tokenizer config, vocab, merges, partial safetensors) exist but are not loaded/used (missing first safetensors shard). Unimplemented features: default Hello World UI, no safetensors reader, no tokenizer parsing, no end-to-end Transformer block, no logits sampling/text generation, tests not in XCTest.

章节 06

Future Development Directions

Next steps for Cadence:

Add safetensors weight loading;
Parse tokenizer vocab and BPE rules;
Combine operators into full Transformer blocks;
Add embedding layer, LM head, KV cache;
Build end-to-end pipeline (prompt → tokens → logits → sampling → text);
Migrate tests to XCTest and add benchmarks.

章节 07

Project Significance & Value

Cadence's value:

Proves Swift+MPSGraph can implement Transformer core operators for Apple platform end-to-end AI;
Lightweight CPU-GPU validation method for operator correctness;
Open-source resource for learning Metal/MPSGraph and LLM inference on Apple Silicon;
Clear code structure and complete operator implementations make it a great learning material for developers interested in Apple native LLM inference.