快速-DLLM:无需训练的扩散语言模型加速
Fast-DLLM: Training-Free Acceleration of Diffusion LLM

原始链接: https://arxiv.org/abs/2505.22618

## Fast-dLLM:加速扩散语言模型 本文介绍**Fast-dLLM**,一种无需重新训练即可显著加速基于扩散的大型语言模型(扩散LLM)的方法。虽然扩散LLM具有更快的并行文本生成潜力,但由于缺乏高效缓存以及在并行解码过程中出现质量损失,它们在历史上一直比传统的自回归模型更慢。 Fast-dLLM通过两个关键创新解决了这些问题:一个**分块近似KV缓存**,允许有效重用先前计算的数据;以及一个**基于置信度的并行解码策略**。后者根据置信度阈值选择性地解码token,保留了通常被并行处理破坏的关键token依赖关系。 在LLaDA和Dream模型上的实验表明,**吞吐量提高了高达27.6倍**,对准确性的影响最小,有效地缩小了与自回归模型的性能差距,使扩散LLM成为实际应用中更可行的选择。

这次黑客新闻的讨论围绕着一篇新论文《Fast-DLLM:无需训练的扩散语言模型加速》,探讨了如何加速扩散语言模型(dLLM)。 最初,一位评论员质疑论文声称的推理速度*慢于*传统架构,因为之前的经验表明dLLM更快。解释在于dLLM通常的工作方式:**双向生成**,需要在整个生成窗口中进行多个步骤。 然而,论文详细介绍了缓解这一问题的方法。它侧重于**动态调整并行令牌生成**——同时生成多个令牌——同时保持输出质量。此外,它还引入了一种新的**KV缓存策略**,以进一步加速这个并行解码过程。本质上,这项研究旨在使dLLM更高效,而无需重新训练。
相关文章

原文

View a PDF of the paper titled Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding, by Chengyue Wu and 8 other authors

View PDF HTML (experimental)
Abstract:Diffusion-based large language models (Diffusion LLMs) have shown promise for non-autoregressive text generation with parallel decoding capabilities. However, the practical inference speed of open-sourced Diffusion LLMs often lags behind autoregressive models due to the lack of Key-Value (KV) Cache and quality degradation when decoding multiple tokens simultaneously. To bridge this gap, we introduce a novel block-wise approximate KV Cache mechanism tailored for bidirectional diffusion models, enabling cache reuse with negligible performance drop. Additionally, we identify the root cause of generation quality degradation in parallel decoding as the disruption of token dependencies under the conditional independence assumption. To address this, we propose a confidence-aware parallel decoding strategy that selectively decodes tokens exceeding a confidence threshold, mitigating dependency violations and maintaining generation quality. Experimental results on LLaDA and Dream models across multiple LLM benchmarks demonstrate up to \textbf{27.6$\times$ throughput} improvement with minimal accuracy loss, closing the performance gap with autoregressive models and paving the way for practical deployment of Diffusion LLMs.
From: Chengyue Wu [view email]
[v1] Wed, 28 May 2025 17:39:15 UTC (272 KB)
[v2] Wed, 2 Jul 2025 05:11:54 UTC (502 KB)
[v3] Thu, 3 Jul 2025 04:51:05 UTC (541 KB)
联系我们 contact @ memedata.com