大型语言模型讲不好笑的笑话，是因为它们避免惊喜。

大型语言模型讲不好笑的笑话，是因为它们避免惊喜。
LLMs tell bad jokes because they avoid surprises

原始链接: https://danfabulich.medium.com/llms-tell-bad-jokes-because-they-avoid-surprises-7f111aac4f96

## LLM 的根本缺陷：缺乏惊喜大型语言模型 (LLM) 在喜剧、艺术、新闻、研究和科学等创造性任务中表现挣扎，这是由于其核心设计所致。LLM 被构建用来*预测*下一个词，本质上最小化了惊喜——这是引人入胜内容的关键要素。一个好的笑话、故事或科学突破不仅仅是出乎意料，而是*事后看来必然*。 LLM 经过大量数据集的训练，以预测可能的结果，无法实现这种平衡。它们追求大多数人的认可，导致输出平淡，缺乏深刻的“啊哈！”时刻。这并非通过更多的数据或处理能力可以解决；限制在于其架构本身。虽然 LLM 可以有效地*生成*代码（在可预测性受到重视的领域），但它们在需要发现和真正洞察力的地方会失败。真正的进步需要一个积极*寻求*出乎意料但合乎逻辑的真理的系统——一种好奇的智能，它完善对世界的理解，而不是简单地镜像现有的模式。LLM 可能是未来人工智能的一个组成部分，但一个真正智能的系统需要拥抱，而不是回避，惊喜的力量。

原文

LLMs suck at comedy, art, journalism, research, and science for the same fundamental reason

Have you ever asked an LLM to tell you a joke? They’re rarely funny at all; they never make you actually laugh.

There’s a deep reason for this, and I think it has serious implications for the limitations of LLMs, not just in comedy, but in art, journalism, research, and science.

If you had to explain the idea of “jokes” to a space alien with no understanding of the idea of humor, you’d explain that a joke is surprising, but inevitable in hindsight.

If you can guess the punchline, the joke won’t be funny. But the punchline also has to be inevitable in hindsight. When you hear the punchline, it has to make you say, “Ah, yes, I should have thought of that myself.”

If, once you hear the punchline, you can’t understand why the punchline was inevitable in hindsight, then we say that you didn’t “get” the joke. The teller can then explain the joke, giving the listener the context to understand why the punchline was inevitable. The joke’s explanation may be enlightening, even surprising in itself, but the joke won’t be funny at that point.

If the joke doesn’t make sense even in hindsight, then it’s a bad joke; nobody “gets” the joke.

There’s no such thing as a universally funny joke, because some people have more context and/or ability to guess/predict punchlines than others. Surprising kids is easy, but they lack context, so much less seems inevitable to them in hindsight. Surprising a professional reviewer is hard (they’ve seen it all, and can guess where you’re going) but that allows more stuff to be inevitable in hindsight.

Professional comedians, when they hear good jokes, tend not to laugh out loud, but just say, “Ah, I see. Yes, that’s a good one.” Kids struggle to understand and remember good jokes, even when you explain them. “Inside jokes” between friends can seem inevitable in hindsight to each other, but meaningless to outsiders. (“You had to be there.”)

When you ask large groups of people to vote for the funniest jokes, the jokes are almost never “laugh out loud” funny, even to the majority of the voters. Good jokes get eliminated by voters who don’t have enough context to “get” the joke, and also by voters with so much context that they find every joke “too predictable.”

LLMs are trained to predict what the “next word” would be a sentence. Their objective requires the LLM to keep surprise to an absolute minimum.

When you ask an LLM to tell a joke, the LLM is guessing what joke a majority of people would find funny. The result is almost never funny.

We can’t fix this by using throwing more GPUs or more training data at the problem. For the same reason you can’t find funnier jokes by polling a larger and larger number of people, the architecture of LLMs is going to give you unfunny jokes by design.

A good story has to be surprising. If you can predict what will happen next, then the story is boring. But the story events also have to follow from one another — they have to be inevitable in hindsight, or the story won’t make sense.

An engaged audience will constantly try to solve the riddle of what will happen next. If they can guess, they’ll be bored, but if the plot events aren’t inevitable in hindsight, the audience will distrust the story and disengage, unwilling to try to win an unfair game.

To be surprised by fiction, you have to care about what happens in the story. If you disengage with a story, if you don’t care what happens, you won’t be surprised, even if you couldn’t predict the ending.

AI-generated stories read like “AI slop” because LLMs can’t tell good stories, for the exact same reason that LLMs can’t tell good jokes: LLMs are trying to minimize surprise.

In the lingo of journalism, journalists don’t just write “news,” they write “stories.” What’s the difference? It’s the difference between simply reporting today’s events and helping to make sense of those events.

“When a dog bites a man, that is not news, because it happens so often. But if a man bites a dog, that is news.”
Alfred Harmsworth

No one will want to read your reporting if the events aren’t surprising. Being surprising, in itself, is enough to be news, and sometimes it’s important to just break an important story in a timely fashion without making sense of it.

But the best journalism doesn’t just say that a surprising event happened, but explains why it happened, why it was inevitable in hindsight. That’s what makes it a “story.”

LLMs suck at journalism because LLMs suck at stories. LLMs suck at discovering surprising facts of all kinds, because LLMs are designed to minimize surprises.

LLMs even suck at finding important mathematical proofs for the exact same reason

Correct mathematical proofs are always inevitable in hindsight, but only a few mathematical proofs are surprising, and those are the ones we find important.

Surprising proofs reach conclusions that the mathematical community assumed were wrong, or prove theorems in ways that we thought wouldn’t work, or prove conjectures that we thought might be impossible to prove. (It can be surprising just to prove something in a way that’s shorter and more elegant than anyone thought possible.)

Today, LLMs can certainly prove mathematical theorems, and have already helped discover some new ones, guided by human researchers. But one of the hopes of AGI was that you could just throw a bunch of GPUs at math and have LLMs prove important theorems.

That’s not going to work, because the whole design of LLMs is to avoid surprises. LLMs avoid important mathematical theorems by design.

Software developers have gotten more mileage out of LLM-generated code than mathematicians have gotten out of LLM-generated proofs, because good code is designed to be unsurprising.

Have you heard of “WTFs per minute”? It’s a jocoserious “measure” of code quality, where you review the code and count the number of times it surprises you, the number of times it makes you say, “WTF?”

In programming, the fewer surprises, the better. This is an area where LLMs can thrive.

To reach AGI, or even just to make today’s AIs more useful, we don’t just need bigger and bigger LLMs, minimizing surprise more and more.

We’ll need a system with its own model of the world, with a goal to learn more about the world by refining its model. It will know that it’s learned things when it finds surprising truths that are inevitable to it in hindsight.

It other words, it will need to be curious about the world, seeking the right kinds of surprises, rather than minimizing them.

A curious system will probably use an LLM to discover/verify surprising truths in unsurprising ways. The LLM will thus play a major part in the story, but not the only part; perhaps not even the largest part.