章节 01
SwiftLM: Apple Silicon Native High-Performance LLM Inference Server (Main Guide)
SwiftLM is a native Swift large language model inference server built for Apple Silicon, based on MLX Swift. It features OpenAI-compatible API, SSD streaming for ultra-large MoE models, TurboQuant KV cache compression, and an iOS companion app. Key advantages include no Python runtime overhead, Metal GPU acceleration, and industry-leading local inference performance on macOS/iOS devices.