双子扩散模型

双子扩散模型
Gemini Diffusion

原始链接: https://simonwillison.net/2025/May/21/gemini-diffusion/

Google I/O发布了Gemini Diffusion，这是谷歌首个使用扩散模型而非Transformer进行文本生成的语言模型。与传统的自回归模型（按顺序生成文本）不同，Gemini Diffusion逐步细化噪声，从而实现更快的迭代和错误修正，这对于数学和代码编辑等任务尤其有利。其主要优势在于速度。初步用户测试显示其性能卓越，可在几秒钟内生成一个交互式的HTML+JavaScript聊天应用程序，速度达到每秒857个token。这一速度可以与使用Cerebras硬件运行Llama 3.1-70b的Cerebras Coder相媲美。谷歌声称Gemini Diffusion的性能可与Gemini 2.0 Flash-Lite媲美，但速度提高了五倍。在此之前，Inception Mercury是唯一另一个市面上可用的用于语言处理的扩散模型。虽然独立的基准测试仍在进行中，但其初始速度已令人印象深刻。

Hacker News 的讨论围绕着谷歌新的 Gemini Diffusion 代码生成模型及其对人工智能领域的潜在影响展开。关键点包括其速度，这是通过基于扩散的方法而不是传统的自回归方法实现的。一些评论者认为知识在于 FFN（前馈神经网络），而注意力机制本身并不那么独特/重要。另一些人指出，它提供了诸如编辑功能和高效利用计算能力等好处。讨论涉及注意力机制与残差连接的相对重要性，认为前者可能是可替代的。还有人认为，大型语言模型缺乏对代码库中 *不存在* 的内容的理解，从而限制了其有效性。讨论深入探讨了扩散过程是应该逐字符（逐词）进行，还是逐句（逐段）进行，以生成更连贯的文本。最后，还就扩散模型是否能够在扩散过程中自我纠正进行了深入的讨论。

双子座2.5闪存 2025-04-17

Gemini 2.5 Pro 预览：更强大的编码性能 2025-05-06

在预览版中使用 Gemini 2.0 创建和编辑图像 2025-05-07

我们的下一代型号：Gemini 1.5 2024-02-16

原文

Gemini Diffusion. Another of the announcements from Google I/O yesterday was Gemini Diffusion, Google's first LLM to use diffusion (similar to image models like Imagen and Stable Diffusion) in place of transformers.

Google describe it like this:

Traditional autoregressive language models generate text one word – or token – at a time. This sequential process can be slow, and limit the quality and coherence of the output.

Diffusion models work differently. Instead of predicting text directly, they learn to generate outputs by refining noise, step-by-step. This means they can iterate on a solution very quickly and error correct during the generation process. This helps them excel at tasks like editing, including in the context of math and code.

The key feature then is speed. I made it through the waitlist and tried it out just now and wow, they are not kidding about it being fast.

In this video I prompt it with "Build a simulated chat app" and it responds at 857 tokens/second, resulting in an interactive HTML+JavaScript page (embedded in the chat tool, Claude Artifacts style) within single digit seconds.

The performance feels similar to the Cerebras Coder tool, which used Cerebras to run Llama3.1-70b at around 2,000 tokens/second.

How good is the model? I've not seen any independent benchmarks yet, but Google's landing page for it promises "the performance of Gemini 2.0 Flash-Lite at 5x the speed" so presumably they think it's comparable to Gemini 2.0 Flash-Lite, one of their least expensive models.

Prior to this the only commercial grade diffusion model I've encountered is Inception Mercury back in February this year.

双子扩散模型 Gemini Diffusion

双子扩散模型
Gemini Diffusion