章节 01
dLLM-Cache: An Adaptive Cache Acceleration Scheme for Diffusion Large Language Models
dLLM-Cache is an open-source PyTorch implementation project targeting inference acceleration of diffusion large language models (dLLM). Its core is an adaptive cache mechanism that intelligently reuses intermediate computation results to reduce redundant calculations, significantly improving inference efficiency while maintaining generation quality. This project addresses the high computation cost bottleneck of diffusion models, enabling wider deployment in real-world scenarios.