过去六个月的大型语言模型发展历程,用骑自行车的鹈鹕来比喻。
The last six months in LLMs, illustrated by pelicans on bicycles

原始链接: https://simonwillison.net/2025/Jun/6/six-months-in-llms/

三月份,OpenAI发布的“GPT-4o”凭借其原生多模态图像生成能力获得了巨大成功,一周内吸引了一亿新用户,高峰时段每小时注册用户达百万。然而,一位用户的经历却凸显了其潜在的缺点。这位用户试图使用AI为自己的狗Cleo穿上鹈鹕服装,生成的图像却意外地包含了一个用户未要求的“Half Moon Bay”(ハーフムーンベイ)标志。 这个意外的添加被归因于ChatGPT的新记忆功能,该功能访问之前的对话以提供上下文。虽然这位用户在纠正AI后最终得到了想要的图像,但他表达了对该功能影响用户输入控制的担忧。他认为,像ChatGPT的记忆功能这样的特性会削弱用户精确控制AI行为的能力,他更喜欢保持完全的控制权。因此,他禁用了记忆功能,并在题为“我真的很不喜欢ChatGPT新的记忆档案”的文章中进一步阐述了他的担忧。

这个Hacker News帖子讨论了Simon Willison用“骑自行车的鹈鹕”作为基准来评估大型语言模型(LLM)的幽默方法。这个基准引发了对其有效性的争论,一些人认为单个样本不足以进行比较,并且训练数据偏差会影响结果。另一些人则认为,这个基准是一种有趣的方式,可以观察模型的局限性和知识缺陷。评论者强调了LLM的营销宣传将其描绘成超越概率模型的东西,并讨论了表示诸如工程设计之类的复杂概念的难度。一些人建议使用视觉模型从多个输出中选择最佳图像。讨论还涉及LLM是否适合更抽象的设计任务,但目前还不能替代现实世界的工程和人类沟通技能。最终,有人提到,即使它并不完美,CEO们也可能会使用它。总的来说,这场讨论探讨了大型语言模型的挑战和局限性,并以鹈鹕的比喻为中心。
相关文章

原文
#

Also in March, OpenAI launched the "GPT-4o native multimodal image generation’ feature they had been promising us for a year.

This was one of the most successful product launches of all time. They signed up 100 million new user accounts in a week! They had a single hour where they signed up a million new accounts, as this thing kept on going viral again and again and again.

I took a photo of my dog, Cleo, and told it to dress her in a pelican costume, obviously.

But look at what it did—it added a big, ugly sign in the background saying Half Moon Bay.

I didn’t ask for that. My artistic vision has been completely compromised!

This was my first encounter with ChatGPT’s new memory feature, where it consults pieces of your previous conversation history without you asking it to.

I told it off and it gave me the pelican dog costume that I really wanted.

But this was a warning that we risk losing control of the context.

As a power user of these tools, I want to stay in complete control of what the inputs are. Features like ChatGPT memory are taking that control away from me.

I don’t like them. I turned it off.

I wrote more about this in I really don’t like ChatGPT’s new memory dossier.

联系我们 contact @ memedata.com