Section 01
TIDE Scheme Overview: Efficient Lossless Inference Acceleration for MoE Diffusion Language Models
This article introduces the TIDE system—an I/O-aware inference optimization scheme for Mixture-of-Experts (MoE) architecture diffusion language models (dLLMs). Its core innovation lies in leveraging the temporal stability of expert activations to achieve lossless acceleration via an interval-based expert refresh strategy. It delivers a 1.4-1.5x throughput improvement on the LLaDA2.0 model, providing a practical solution for the efficient deployment of large-scale MoE dLLMs.