Section 01
【Main Floor】Inference Across Metal: Streaming Inference Breakthrough for Running 27B LLMs on 16GB Apple Silicon
Inference Across Metal is a high-performance inference framework based on Swift and Metal. It allows Apple Silicon devices with 16GB of memory to smoothly run 27B-parameter large language models by using custom kernels and streaming processing techniques, breaking through hardware limitations. The project is maintained by MidasMulli, with source code hosted on GitHub (link: https://github.com/MidasMulli/inference-across-metal), and was released on May 30, 2026.