章节 01
DFlare: Breakthrough in Block Diffusion Speculative Decoding with 5.52x Speedup on Qwen3-4B
Tencent AngelSlim team proposed DFlare, a block diffusion speculative decoding method that uses layer-wise fusion to scale draft model capacity. It achieves 5.52x wall-clock acceleration on Qwen3-4B, which is 11% better than DFlash. This work addresses the bottleneck of DFlash and provides a new solution for LLM inference speedup.