Reading

Mac LLM Bench: A Community Project for Apple Silicon Local LLM Performance Benchmarking

A community-driven performance benchmark database for large language models (LLMs) running locally on Apple Silicon Macs. It supports all M1-M5 chip series, covers 14 models including Gemma 3, Qwen 3, and DeepSeek R1 Distill, and provides standardized, reproducible performance testing protocols.

Apple SiliconMacLLM基准测试性能测试llama.cppGemma 3Qwen 3DeepSeek本地推理

Published 2026-04-06 19:14Recent activity 2026-04-06 19:22Estimated read 6 min

Mac LLM Bench: A Community Project for Apple Silicon Local LLM Performance Benchmarking

Section 01

Mac LLM Bench: Introduction to the Apple Silicon Local LLM Performance Benchmark Community Project

Mac LLM Bench is a community-driven performance benchmark database for large language models (LLMs) running locally on Apple Silicon Macs. It supports all M1-M5 chip series, covers 14 models including Gemma 3, Qwen 3, and DeepSeek R1 Distill, and provides standardized, reproducible performance testing protocols. The project aims to solve the problem of users choosing LLM models and configurations suitable for their Macs. It builds a comprehensive performance map through crowdsourcing, helping users query the running speed and optimal configuration of specific models on their devices.

Section 02

Project Background and Core Objectives

Apple Silicon has evolved into five generations of product lines (M1-M5), each with variants like base, Pro, Max, Ultra, and memory configurations ranging from 8GB to 256GB. Coupled with the diversity of LLM models and quantization schemes, ordinary users find it hard to intuitively understand which models their Mac can run and at what speed. The core objective of the project is to establish a comprehensive, reproducible performance database, allowing users to query the running speed of specific LLMs on their Macs and find optimal configurations. It uses a community contribution model to form a crowdsourced performance map.

Section 03

Technical Architecture and Testing Methods

The project uses llama-bench from llama.cpp as the core testing tool because its test content is neutral and fully reproducible. Testing metrics include: prompt processing speed (pp128/256/512, tokens per second), text generation speed (tg128/256, tokens per second); auxiliary metrics include peak memory usage (measured via /usr/bin/time) and optional perplexity (tested on WikiText-2).

Section 04

Supported Models and Quantization Schemes

The project covers 14 models from three major model families (no HuggingFace login required for download): Gemma 3 (1B/4B/12B/27B), Qwen 3 (0.6B-32B including 30B-A3B MoE), DeepSeek R1 Distill (7B/14B/32B). You can view models via ./bench.sh --list, and use --sweep or --sweep-full to automatically find the optimal quantization configuration and layer count.

Section 05

Hardware Coverage and Quick Usage Guide

Hardware coverage includes all Apple Silicon series (M1-M5 variants, different core/memory configurations), and results are stored in directories grouped by chip generation. The usage threshold is low: you need an Apple Silicon Mac, macOS, and install llama.cpp (via Homebrew) and huggingface-hub (via pip). Three steps for quick testing: git clone the project → cd into it → run ./bench.sh --quick; use --auto mode to test all compatible models, and run python3 scripts/generate_results.py to generate a results table.

Section 06

Community Contribution and Data Quality Assurance

The project uses an open-source collaboration model; users can submit PRs to contribute results after completing tests. The process is standardized via CONTRIBUTING.md, with strict JSON result format (schemas/result.schema.json). Automated scripts generate unified tables, and raw data is organized by chip model, core configuration, etc., to ensure data quality.

Section 07

Project Value and Future Outlook

Project Value: Establishes a standardized evaluation framework for the Apple Silicon platform, helping ordinary users choose devices/models, developers optimize performance, and researchers understand competitiveness. It serves as infrastructure for edge computing and local AI development. Future Outlook: Fill in M1-M4 data, expand model families, and welcome improvement suggestions; Participation: Start with --quick testing and submit complete test results. Project URL: https://github.com/enescingoz/mac-llm-bench.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15