Section 01
Introduction: Core Overview of the Predict-then-Diffuse Framework
Diffusion Large Language Models (D-LLMs) have significant advantages in throughput and GPU utilization due to their parallel generation mechanism, but fixed-length constraints lead to computational resource waste or reduced output quality. The Predict-then-Diffuse framework solves this dilemma through a two-stage strategy of "predicting response length first, then performing diffusion generation", significantly reducing inference FLOP overhead while maintaining output quality.