Reading

EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift

EdgeRunner is a local large language model (LLM) inference engine built entirely with Swift and Metal, optimized specifically for Apple Silicon. It supports direct loading of GGUF format models without conversion or C++ dependencies, achieving a decoding speed of over 230 tokens per second on the M3 Max.

SwiftMetalApple Silicon本地推理GGUFLLM边缘计算隐私保护

Published 2026-05-06 08:41Recent activity 2026-05-06 08:52Estimated read 1 min

Section 01

EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift

导读 / 主楼：EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift

Introduction / Main Floor: EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift

EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift

导读 / 主楼：EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift

Introduction / Main Floor: EdgeRunner: A Local LLM Inference Engine for Apple Silicon Implemented Purely in Swift

Continue Reading

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

LLM-assisted-analysis: A New Approach to Detecting Logical Vulnerabilities in Smart Contracts Using Large Language Models

Building Modern LLM from Scratch: A Tutorial-level Implementation of Llama-style Language Model