Everything around LLMs is still magical and wishful thinking

原始链接: https://dmitriid.com/everything-around-llms-is-still-magical-and-wishful-thinking

The varying opinions on LLMs stem from a lack of standardized and quantifiable descriptions of how they're used. One person's success story in a greenfield React project with a specific LLM agent can't be compared to another's struggles with a proprietary OCaml codebase using a different one due to the non-deterministic nature of these tools. Crucially, we lack critical details in reported experiences: project specifics, codebases (age, type), user expertise, relevance of expertise, and the amount of post-LLM work required. This absence fuels hype and magical thinking, where anecdotal successes gain traction regardless of context. Even claims from "industry leaders" often lack crucial details, like codebase size, bug complexity, or required oversight. Skeptics are often portrayed as clueless morons. However, many critics actively use these tools daily, experiencing their limitations firsthand. They see LLMs as statistical machines, not magic or reliable engineering.

A Hacker News thread discusses an article arguing that LLMs are still magical and based on wishful thinking. Commenters offer diverse viewpoints. Some highlight significant productivity gains, especially in code generation and automating data-related tasks, leading to roles evolving towards architecture and management. One user claims to not have written a line of code for months due to AI assistance. Others caution against overblown expectations of 10x productivity, citing Amdahl's law and the time spent on non-coding tasks like communication and research. Concerns are raised about the cost of LLM tools impacting overall production expenses. Several users emphasize the value of LLMs for brainstorming and research, treating them as thinking partners rather than replacements. Skepticism remains about trusting AI-generated code for mission-critical applications, and the risk of AI over promising in the context of project implementation. There are opinions that AI hype is another version of what happened with Crypto and there is too much misrepresentation about their capabilities.

原文

Hacker News brought this gem of a comment in a yet another discussion about AI:

Much of the criticism of AI on HN feels driven by devs who have not fully ingested what is going with MCP, tools etc. right now as not looked deeper than making API calls to an LLM

As I responded, this is crypto all over again. If you dare question anything around ~~crypto~~ AI, you're just a clueless moron who hasn't realised the one true meaning of things.

Another person chimed in with an astute observation:

The huge gap between the people who claim "It helps me some/most of the time" and the other people who claim "I've tried everything and it's all bad" is really interesting to me.

The answer to this is easy, simple, and rather obvious. However, in an industry increasingly overwhelmed by magical, wishful thinking, I haven't many people address this.

So why is there such a gap? Why do some people see LLMs as magical wish-granting miracles, and others dismiss as useless?

I've answered in the comments, and I'll reproduce the answer here.

Because we only see very disjointed descriptions, with no attempt to quantify what we're talking about.

For every description of how LLMs work or don't work we know only some, but not all of the following:

Do we know which projects people work on? No
Do we know which codebases (greenfield, mature, proprietary etc.) people work on? No
Do we know the level of expertise the people have? No. Is the expertise in the same domain, codebase, language that they apply LLMs to? We don't know.
How much additional work did they have reviewing, fixing, deploying, finishing etc.? We don't know.

Even if you have one person describing all of the above, you will not be able to compare their experience to anyone else's because you have no idea what others answer for any of those bullet points.

And that's before we get into how all these systems and agents are completely non-deterministic, and what works now may not work even 1 minute from now for the exact same problem.

And that's before we ask the question of how a senior engineer's experience with a greenfield project in React with one agent and model can even be compared to a non-coding designer in a closed-source proprietary codebase in OCaml with a different agent and model (or even the same agent/model, because of non-determinism).

And yet, hype and magic have such a sway over our industry that seemingly a majority of people just buy in to whatever claim, however outrageous or truthful it is.

It's especially egregious when it comes from "industry leaders" which just say things like this

I've been using Claude Code for a couple of days, and it has been absolutely ruthless in chewing through legacy bugs in my gnarly old code base. It's like a wood chipper fueled by dollars. It can power through shockingly impressive tasks, using nothing but chat.

You don't even select context. You just open your heart and your wallet, and Claude Code takes the wheel.

... As long as the bank authorizations keep coming through, it will push on bug fixes until they're deployed in production, and then start scanning through the user logs to see how well it's doing.

How large is the codebase? Unknown.
What bugs? Unknown.
Any additional babysitting? Unknown.
Perhaps programming language and frameworks? Unknown.

And yet there are 1.8k likes and 204 reposts.

So yeah. If you don't turn off the part of your brain responsible for critical thinking and buy into the hype hook line and sinker, you're a clueless moron who doesn't understand the true meaning of things.

Wait. "What about you, the author?", you may ask.

I've used most of the tools available under the sun in multiple combinations. I have side projects entirely designed by Vercel's v0. I have a full monitoring app built in SwiftUI (I know zero Swift) with Claude Code. I create posters for events I host with Midjourney. I vibe-coded an MCP server in Elixir (but not in phoenix.new).

Like most skeptics and critics, I use these tools daily.

And 50% of the time they work 50% of the time.

It's a non-deterministic statistical machine. When it works, it may feel like magic. But it's neither magic nor is it engineering.

The whole discourse around LLMs assumes it's strictly one of the two.

And here we are.