Section 01
Domino Framework Overview: A New Breakthrough in Speculative Decoding with Decoupled Causal Modeling and Autoregressive Drafting
Domino is an innovative framework for speculative decoding. By decoupling the parallel drafting backbone and the lightweight causal refinement module (Domino Head), it breaks through the quality-cost trade-off dilemma in traditional speculative decoding. On the Qwen3 model, Domino achieves a maximum throughput speedup of 5.8x, significantly improving the inference efficiency of large models.