(comments)
原始链接: https://news.ycombinator.com/item?id=44060533
Here's a short summary of the Hacker News discussion:
A user inquired about the continued prevalence of diffusion language models despite flow matching's perceived superiority in image generation. Another user suggested that the established expertise and fine-tuning of diffusion models might explain their current dominance.
A link to a previous discussion and research papers were shared, suggesting that diffusion models potentially exhibit better reasoning capabilities by avoiding early token bias, unlike autoregressive models.
Other users discussed the potential for combining diffusion and transformer architectures, possibly alternating their roles within a single interface based on context. The original author of the linked blog post clarified that current diffusion model implementations require attention score calculations across the entire sequence, limiting cacheability advantages compared to autoregressive models, even when denoising only a portion of the text. The blog post was thanked by the users.
reply