Zing Forum

Reading

Mac LLM Bench: A Community Project for Apple Silicon Local LLM Performance Benchmarking

A community-driven performance benchmark database for large language models (LLMs) running locally on Apple Silicon Macs. It supports all M1-M5 chip series, covers 14 models including Gemma 3, Qwen 3, and DeepSeek R1 Distill, and provides standardized, reproducible performance testing protocols.

Apple SiliconMacLLM基准测试性能测试llama.cppGemma 3Qwen 3DeepSeek本地推理
Published 2026-04-06 19:14Recent activity 2026-04-06 19:22Estimated read 6 min
Mac LLM Bench: A Community Project for Apple Silicon Local LLM Performance Benchmarking
1

Section 01

Mac LLM Bench: Introduction to the Apple Silicon Local LLM Performance Benchmark Community Project

Mac LLM Bench is a community-driven performance benchmark database for large language models (LLMs) running locally on Apple Silicon Macs. It supports all M1-M5 chip series, covers 14 models including Gemma 3, Qwen 3, and DeepSeek R1 Distill, and provides standardized, reproducible performance testing protocols. The project aims to solve the problem of users choosing LLM models and configurations suitable for their Macs. It builds a comprehensive performance map through crowdsourcing, helping users query the running speed and optimal configuration of specific models on their devices.

2

Section 02

Project Background and Core Objectives

Apple Silicon has evolved into five generations of product lines (M1-M5), each with variants like base, Pro, Max, Ultra, and memory configurations ranging from 8GB to 256GB. Coupled with the diversity of LLM models and quantization schemes, ordinary users find it hard to intuitively understand which models their Mac can run and at what speed. The core objective of the project is to establish a comprehensive, reproducible performance database, allowing users to query the running speed of specific LLMs on their Macs and find optimal configurations. It uses a community contribution model to form a crowdsourced performance map.

3

Section 03

Technical Architecture and Testing Methods

The project uses llama-bench from llama.cpp as the core testing tool because its test content is neutral and fully reproducible. Testing metrics include: prompt processing speed (pp128/256/512, tokens per second), text generation speed (tg128/256, tokens per second); auxiliary metrics include peak memory usage (measured via /usr/bin/time) and optional perplexity (tested on WikiText-2).

4

Section 04

Supported Models and Quantization Schemes

The project covers 14 models from three major model families (no HuggingFace login required for download): Gemma 3 (1B/4B/12B/27B), Qwen 3 (0.6B-32B including 30B-A3B MoE), DeepSeek R1 Distill (7B/14B/32B). You can view models via ./bench.sh --list, and use --sweep or --sweep-full to automatically find the optimal quantization configuration and layer count.

5

Section 05

Hardware Coverage and Quick Usage Guide

Hardware coverage includes all Apple Silicon series (M1-M5 variants, different core/memory configurations), and results are stored in directories grouped by chip generation. The usage threshold is low: you need an Apple Silicon Mac, macOS, and install llama.cpp (via Homebrew) and huggingface-hub (via pip). Three steps for quick testing: git clone the project → cd into it → run ./bench.sh --quick; use --auto mode to test all compatible models, and run python3 scripts/generate_results.py to generate a results table.

6

Section 06

Community Contribution and Data Quality Assurance

The project uses an open-source collaboration model; users can submit PRs to contribute results after completing tests. The process is standardized via CONTRIBUTING.md, with strict JSON result format (schemas/result.schema.json). Automated scripts generate unified tables, and raw data is organized by chip model, core configuration, etc., to ensure data quality.

7

Section 07

Project Value and Future Outlook

Project Value: Establishes a standardized evaluation framework for the Apple Silicon platform, helping ordinary users choose devices/models, developers optimize performance, and researchers understand competitiveness. It serves as infrastructure for edge computing and local AI development. Future Outlook: Fill in M1-M4 data, expand model families, and welcome improvement suggestions; Participation: Start with --quick testing and submit complete test results. Project URL: https://github.com/enescingoz/mac-llm-bench.