Section 01
[Introduction] Real-World LLM Inference Testing on Consumer Hardware: Counterintuitive Quantization Results on Apple Silicon
The transformers-laptop-bench project developed by original author Valerio Maggio (GitHub link: https://github.com/leriomaggio/transformers-laptop-bench) conducts open-source LLM inference cost benchmarking for consumer hardware (CUDA/Apple Silicon/CPU). The core finding is: On Apple Silicon, quantization not only fails to improve performance but also significantly reduces inference speed and even increases memory usage—contrary to common intuition. The tests cover metrics like time-to-first-token, total latency, throughput, and peak memory, aiming to provide ordinary users with real data references for running LLMs locally.