永恒的九月余波

永恒的九月余波
The Eternal Sloptember

原始链接: https://geohot.github.io//blog/jekyll/update/2026/05/24/the-eternal-sloptember.html

作者认为，将人工智能代理集成到软件开发中是一个重大错误。他指出，当前的模型本质上只是模仿编程而非真正理解编程的复杂统计引擎。尽管承认人工智能在快速原型设计和信息检索方面很有用，但作者认为它无法生成高质量、可投入生产的代码。核心担忧在于，人工智能代理在实现稳健软件所需的“润色”方面始终表现不足，容易产生难以察觉的隐蔽错误。虽然高水平的个人或许能通过手动审计每一行代码来驾驭这些工具，但大型组织却面临着生态系统被“劣质代码”淹没的风险。由于这些组织缺乏顶尖工程师所拥有的紧密反馈循环，对人工智能的依赖很可能会降低而非提高软件的整体质量。归根结底，作者认为大型语言模型缺乏进行真正编程所必需的世界模型。由于误将统计输出视为智能、有目的的构建，企业正落入一场优先考虑输出数量而非实质内容的“心理战”骗局中，这将导致未来充斥着大量无法维护的破损代码。

这场 Hacker News 讨论聚焦于当前关于人工智能在软件工程中角色的辩论，这一辩论由“永恒的垃圾九月”（Eternal Sloptember）这一概念引发。参与者对该技术的实用性看法截然对立。支持者认为人工智能是搜索功能的强大进化，能显著提高生产力、处理繁琐的样板代码，并辅助研究或调试。他们认为，虽然人工智能无法取代人类架构师，但它能比人类更快地生成功能性代码，这对日常工作而言是净增益。相反，怀疑论者将当前的人工智能描述为容易产生不可预测错误的复杂统计模型，并将其比作需要持续人工监督的“自动驾驶汽车”。批评者认为，人工智能难以胜任新颖或复杂的任务，往往会产生模仿现有训练数据但无法应对尖端技术栈或高风险生产系统的“垃圾内容”。总体而言，这场讨论反映了行业内更广泛的紧张关系：许多人承认人工智能是不可或缺的模式匹配工具，同时也警告不要将其视为能够进行稳健软件工程的自主代理。大多数人认同事实介于“人工智能将取代我们所有人”与“人工智能完全是欺诈”之间。

原文

I’m calling it now, the adoption of AI agents into software development will be one of the most costly mistakes in the field’s history. Agents cannot program, and it’s taking longer and longer to realize that they can’t. They are a highly sophisticated statistical model designed to mimic the distribution of programming. The output is broken, but in a way that’s getting harder and harder to detect. Which is exactly what you’d expect from an increasingly accurate statistical model.

At first, I rejected this. I bought into the Twitter explanation of status anxiety. I define some of my self worth by my programming abilities, so wouldn’t it make sense to get defensive around that loss? Deny the models can code for as long as I could to preserve my ego?

I mean, it’s very clear they can solve math problems I couldn’t hope to solve if I devoted my life to it. So why can’t they program? Maybe I’m just not good enough of a programmer to recognize their genius.

I really tried for the last 6 months. I wrote some parts of tinygrad with agents. I reversed a USB <-> PCIe chip with agents. But each time I suspected I could have done it better and faster manually. The agent frontloads all the progress, then gives you a slot machine lever to pull to hope it gets the polish done. It never quite gets there.

And in before, “you are using it wrong.” I have tried all the different models, different harnesses, different prompts. It’s not this. The people who say this would probably say the same thing about slot machines, you see, you have to bet 5 lines after you get a cherry no wonder you aren’t winning!

I’m not saying that AI isn’t useful, it clearly is. It’s definitely a better Google for most searches. And whenever you need a quick prototype and don’t care about polish, it is absurdly fast. But is it a software engineer? Not close to the bar at any company I have worked at. The key aspect is knowing when to use it and when not to.

I thought more about the self worth preservation thing. AFL found more bugs than LLMs and nobody felt that way about it. Chess and Go are more popular than ever. I cannot fucking wait until I have armies of robot associates I can trust to clean up my code! I don’t fear loss of status, I almost think this is some kind of psyop to sell agents. Fear of loss is one of the only ways to make big companies move. Though I think in that fear they are making a big mistake.

Agents will end up hurting large organizations more than high performing individuals or small orgs. I’ve watched how my friends and coworkers have adopted these tools over the last 6 months. A trait you find in all high performing people is the ability to error correct, and they have mostly been good at seeing when slop is slop. It takes a bit to explore/exploit and tune the outer loops around when to use them, when to trust them, how to use them, etc…but I haven’t seen anyone of them move to a model where they don’t carefully read and understand each line, except in some confined domains.

Contrast this with a large organization. Much slower feedback loops, much less alignment. The bottom performers won’t have that self check. They are the ones producing 10x output with the agents. What do you think is happening to the average output of that organization? What is happening to the average output of the world?

Agents will end up producing more code, more apps, and more features than ever before. It is a golden era for buckets and buckets of slop, and a dark age for gems of quality.

I hear that Apple is pushing AI on all their engineers. When people think in the abstract, they think AI will do all this stuff, but let’s focus on a concrete example. Do you think macOS will get better or worse in the next 2 years?

When people see an artifact, they make assumptions about the process that was used to create it. Without even thinking about it, they assume the creator had a basically human state of mind. This assumption is no longer true. Things can be broken in ways that weren’t previously possible, and old proxies of underlying quality like syntax and grammar are useless. AI produced artifacts are not produced by the same process as human ones, and this difference, while extremely subtle in statistics, makes itself obvious when you try to interact with and build on the artifact in human ways.

Without fully endorsing all their ideas, I’m now in the LeCun/Marcus camp on LLMs. I don’t think models like this will ever be able to program, I think the process matters. I think that deep learning is still the solution, but real programming agents will need world models, not some RLVR shit that comments out the failing test and tells you all the tests are now passing.

The real story of this era will be who manages to avoid harming themselves in their AI psychosis.

永恒的九月余波 The Eternal Sloptember

永恒的九月余波
The Eternal Sloptember