Section 01
dLLM-Cache: Guide to the Innovative Solution for Accelerating Diffusion Large Language Models via Adaptive Caching
dLLM-Cache is an open-source project implemented based on PyTorch, aiming to address the bottleneck of slow inference speed in diffusion large language models (dLLMs) through an adaptive caching mechanism. This solution does not require modifying the model architecture, can dynamically adjust caching strategies, significantly reduces redundant computations, improves inference speed, lowers computational costs, and creates conditions for real-time applications and edge deployment.