AR-扩散:用于文本生成的自动回归扩散模型
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

原始链接: https://arxiv.org/abs/2305.09515

本文介绍了AR-Diffusion,这是一种针对文本生成定制的新型扩散模型,旨在解决自然语言固有的顺序依赖性问题。与现有同时生成所有词元的扩散语言模型不同,AR-Diffusion采用自回归方法,确保右侧词元的生成依赖于左侧已生成的词元。这是通过根据词元位置改变去噪步骤的数量来实现的,左侧词元经历较少的步骤,从而影响后续右侧词元的生成。作者证明了AR-Diffusion在各种文本生成任务(包括文本摘要、机器翻译和常识生成)中优于现有的扩散语言模型。此外,该模型取得了可比的结果,速度显著提高,据报道快了100倍到600倍。AR-Diffusion的代码已公开发布。

Here's a summary of the Hacker News post: A Hacker News thread discusses a paper on AR-Diffusion, an auto-regressive diffusion model for text generation. User "jbellis" hopes diffusion models can scale up, suggesting speed is crucial for tasks with a minimum intelligence requirement. They cite Gemini and Inception as examples. "MarcoDewey" proposes diffusion models could surpass traditional autoregressive methods for code generation. Finally, "meatmanek" suggests the post's title should include the year "[2023]" for clarity. The general tone seems to be cautiously optimistic about the potential of diffusion models, especially in comparison to existing autoregressive approaches.

原文

View a PDF of the paper titled AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation, by Tong Wu and 11 other authors

View PDF HTML (experimental)
Abstract:Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained with a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks, including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated its superiority over existing diffusion language models and that it can be $100\times\sim600\times$ faster when achieving comparable results. Our code is available at this https URL.
From: Tong Wu [view email]
[v1] Tue, 16 May 2023 15:10:22 UTC (442 KB)
[v2] Fri, 19 May 2023 02:08:36 UTC (828 KB)
[v3] Wed, 13 Dec 2023 10:24:00 UTC (829 KB)
联系我们 contact @ memedata.com