Section 01
Introduction: TIDE Framework Enables Cross-Architecture Distillation, Significantly Boosting Small-Parameter dLLM Performance
Diffusion Language Models (dLLM) have advantages in parallel decoding and bidirectional context modeling, but their performance is too tightly bound to parameter scale. The TIDE framework achieves cross-architecture knowledge distillation for the first time, compressing an 8B dense model and a 16B MoE model into a lightweight 0.6B student model. On the code generation task HumanEval, its score jumps from 32.3 to 48.78, breaking the scale bottleneck for the practical application of dLLMs.