Section 01
[Introduction] SMC-SD: Core Points of the New Sequence Monte Carlo-Based Speculative Decoding Acceleration Method
This paper proposes the SMC-SD method, which addresses the 'all-or-nothing' bottleneck of traditional speculative decoding by replacing token-level rejection sampling with a Sequence Monte Carlo-based importance-weighted resampling strategy. Experiments show that this method achieves 2.36x acceleration over speculative decoding and 5.2x over autoregressive decoding, with accuracy loss controlled within 3%, providing an efficient and quality-controllable new path for LLM inference acceleration.