Zing Forum

Reading

EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift

EdgeRunner is a local large language model (LLM) inference engine built entirely with Swift and Metal, optimized specifically for Apple Silicon. It supports direct loading of GGUF format models without conversion or C++ dependencies, achieving a decoding speed of over 230 tokens per second on the M3 Max.

SwiftMetalApple Silicon本地推理GGUFLLM边缘计算隐私保护
Published 2026-05-06 08:41Recent activity 2026-05-06 08:52Estimated read 1 min
EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift
1

Section 01

导读 / 主楼:EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift

Introduction / Main Floor: EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift

EdgeRunner is a local large language model (LLM) inference engine built entirely with Swift and Metal, optimized specifically for Apple Silicon. It supports direct loading of GGUF format models without conversion or C++ dependencies, achieving a decoding speed of over 230 tokens per second on the M3 Max.