Section 01
Introduction: Core Value of Speculative Decoding Practice on Apple Silicon
The berezucc/speculative-decoding project provides an ~200-line PyTorch implementation showing how to optimize inference throughput from 0.83× to 1.16× using speculative decoding on Apple M2 Max. It includes key decisions and failed attempts during the optimization process, serving as a detailed engineering note for turning theory into practice.