Have you ever asked an LLM to tell you a joke? They’re rarely funny at all; they never make you actually laugh.
There’s a deep reason for this, and I think it has serious implications for the limitations of LLMs, not just in comedy, but in art, journalism, research, and science.
If you had to explain the idea of “jokes” to a space alien with no understanding of the idea of humor, you’d explain that a joke is surprising, but inevitable in hindsight.
If you can guess the punchline, the joke won’t be funny. But the punchline also has to be inevitable in hindsight. When you hear the punchline, it has to make you say, “Ah, yes, I should have thought of that myself.”
If, once you hear the punchline, you can’t understand why the punchline was inevitable in hindsight, then we say that you didn’t “get” the joke. The teller can then explain the joke, giving the listener the context to understand why the punchline was inevitable. The joke’s explanation may be enlightening, even surprising in itself, but the joke won’t be funny at that point.
If the joke doesn’t make sense even in hindsight, then it’s a bad joke; nobody “gets” the joke.
There’s no such thing as a universally funny joke, because some people have more context and/or ability to guess/predict punchlines than others. Surprising kids is easy, but they lack context, so much less seems inevitable to them in hindsight. Surprising a professional reviewer is hard (they’ve seen it all, and can guess where you’re going) but that allows more stuff to be inevitable in hindsight.
Professional comedians, when they hear good jokes, tend not to laugh out loud, but just say, “Ah, I see. Yes, that’s a good one.” Kids struggle to understand and remember good jokes, even when you explain them. “Inside jokes” between friends can seem inevitable in hindsight to each other, but meaningless to outsiders. (“You had to be there.”)
When you ask large groups of people to vote for the funniest jokes, the jokes are almost never “laugh out loud” funny, even to the majority of the voters. Good jokes get eliminated by voters who don’t have enough context to “get” the joke, and also by voters with so much context that they find every joke “too predictable.”
LLMs are trained to predict what the “next word” would be a sentence. Their objective requires the LLM to keep surprise to an absolute minimum.
When you ask an LLM to tell a joke, the LLM is guessing what joke a majority of people would find funny. The result is almost never funny.
We can’t fix this by using throwing more GPUs or more training data at the problem. For the same reason you can’t find funnier jokes by polling a larger and larger number of people, the architecture of LLMs is going to give you unfunny jokes by design.
A good story has to be surprising. If you can predict what will happen next, then the story is boring. But the story events also have to follow from one another — they have to be inevitable in hindsight, or the story won’t make sense.
An engaged audience will constantly try to solve the riddle of what will happen next. If they can guess, they’ll be bored, but if the plot events aren’t inevitable in hindsight, the audience will distrust the story and disengage, unwilling to try to win an unfair game.
To be surprised by fiction, you have to care about what happens in the story. If you disengage with a story, if you don’t care what happens, you won’t be surprised, even if you couldn’t predict the ending.
AI-generated stories read like “AI slop” because LLMs can’t tell good stories, for the exact same reason that LLMs can’t tell good jokes: LLMs are trying to minimize surprise.
In the lingo of journalism, journalists don’t just write “news,” they write “stories.” What’s the difference? It’s the difference between simply reporting today’s events and helping to make sense of those events.
“When a dog bites a man, that is not news, because it happens so often. But if a man bites a dog, that is news.”
Alfred Harmsworth
No one will want to read your reporting if the events aren’t surprising. Being surprising, in itself, is enough to be news, and sometimes it’s important to just break an important story in a timely fashion without making sense of it.
But the best journalism doesn’t just say that a surprising event happened, but explains why it happened, why it was inevitable in hindsight. That’s what makes it a “story.”
LLMs suck at journalism because LLMs suck at stories. LLMs suck at discovering surprising facts of all kinds, because LLMs are designed to minimize surprises.
LLMs even suck at finding important mathematical proofs for the exact same reason
Correct mathematical proofs are always inevitable in hindsight, but only a few mathematical proofs are surprising, and those are the ones we find important.
Surprising proofs reach conclusions that the mathematical community assumed were wrong, or prove theorems in ways that we thought wouldn’t work, or prove conjectures that we thought might be impossible to prove. (It can be surprising just to prove something in a way that’s shorter and more elegant than anyone thought possible.)
Today, LLMs can certainly prove mathematical theorems, and have already helped discover some new ones, guided by human researchers. But one of the hopes of AGI was that you could just throw a bunch of GPUs at math and have LLMs prove important theorems.
That’s not going to work, because the whole design of LLMs is to avoid surprises. LLMs avoid important mathematical theorems by design.
Software developers have gotten more mileage out of LLM-generated code than mathematicians have gotten out of LLM-generated proofs, because good code is designed to be unsurprising.
Have you heard of “WTFs per minute”? It’s a jocoserious “measure” of code quality, where you review the code and count the number of times it surprises you, the number of times it makes you say, “WTF?”
In programming, the fewer surprises, the better. This is an area where LLMs can thrive.
To reach AGI, or even just to make today’s AIs more useful, we don’t just need bigger and bigger LLMs, minimizing surprise more and more.
We’ll need a system with its own model of the world, with a goal to learn more about the world by refining its model. It will know that it’s learned things when it finds surprising truths that are inevitable to it in hindsight.
It other words, it will need to be curious about the world, seeking the right kinds of surprises, rather than minimizing them.
A curious system will probably use an LLM to discover/verify surprising truths in unsurprising ways. The LLM will thus play a major part in the story, but not the only part; perhaps not even the largest part.