Section 01
[Introduction] FlexDraft: Core Innovations and Value of the Flexible Speculative Decoding Framework
FlexDraft is a lossless speculative decoding framework. To address the performance collapse issue of traditional speculative decoding methods in large-batch scenarios, it achieves flexible adaptation to varying batch sizes through three key designs: attention fine-tuning, reward token-guided calibration, and dynamic decoding strategy switching, thereby improving LLM inference efficiency without sacrificing output quality.