《痛苦教训》是关于AI智能体的。

《痛苦教训》是关于AI智能体的。
Bitter Lesson is about AI agents

原始链接: https://ankitmaloo.com/bitter-lesson/

理查德·萨顿的“苦涩教训”认为，在人工智能领域，纯粹的计算能力始终优于复杂的人工设计方案。作者最初偏爱复杂软件开发方法，但后来意识到人工智能系统在大型计算中蓬勃发展。这类似于植物在基本输入下自然生长，人工智能受益于简单架构的大规模计算。客服自动化就是一个很好的例子：基于规则的系统失败了，计算能力有限的智能体也举步维艰，但使用并行处理和多种推理路径的“规模化”解决方案通过探索各种解决方案而成功。强化学习进一步证明了这一点，训练后的计算使智能体能够发现更优的问题解决方法。这种转变需要改变人工智能工程方法：优先考虑可扩展的架构、可并行化的系统和灵活的学习框架。投资计算基础设施至关重要。人工智能工程师的角色正在演变，他们需要构建能够有效利用计算资源的系统，创建可扩展的学习环境，而不是专注于复杂的算法。最终，那些能够驾驭计算能力的人将引领人工智能竞赛。

Hacker News 上关于“苦涩的教训”一文的讨论主要围绕着全自动 AI 与人工监督系统之间的权衡展开。一位评论者强调了管理用户预期的重要性，并建议一个精度略低但一致性更高的 AI 智能体（80% ± 10%）优于一个潜在精度更高但性能更不稳定的智能体（90% ± 40%）。另一位评论者将此与国际象棋电脑进行了类比，指出虽然已经实现了“超人类”的性能，但市场却由 Stockfish 等“足够好”的解决方案主导。他们认为，仅仅依靠巨大的计算能力来实现某些目标并不能保证拥有庞大的市场来支撑。他们还强调了使计算密集型系统真正发挥作用所需的巨大基础设施和人力投入（例如模型训练）。其他评论者同意，更大的计算能力通常比人工引导的方法能带来更好的结果，但一些人对这种方法的高昂成本表示担忧，特别是 GPU 的成本。

（评论） 2024-03-17

（评论） 2025-03-06

人工智能搜索：惨痛的教训 2024-06-16

(评论) 2025-03-12

年轻血液分布式系统注意事项 2024-09-04

原文

The Race for AI Progress

In 2019, Richard Sutton, wrote his groundbreaking essay titled ‘The Bitter Lesson’. Simply put, the essay concludes that systems which get better with higher compute beat the systems that do not. Or specifically in AI: raw computing power consistently wins over intricate human-designed solutions. I used to believe that clever orchestrations and sophisticated rules were the key to building better AI systems. That was a typical sofware dev mentality. You build a system, look for edgecases, cover them and you are good to go. Boy, was I wrong.

Think of it like training for a marathon. You could spend months perfecting your running form and buying the latest gear, but nothing beats putting in the miles. In AI, those miles are compute cycles.

Nature’s Blueprint

Recently, I was tending to my small garden when it hit me - a perfect analogy for this principle. My plants don’t need detailed instructions to grow. Given the basics (water, sunlight, and nutrients), they figure out the rest on their own. This is exactly how effective AI systems work.

When we over-engineer AI solutions, we’re essentially trying to micromanage that plant, telling it exactly how to grow each leaf. Not only is this inefficient, but it often leads to brittle systems that can’t adapt to new situations.

A Tale of Three Approaches

Today, one of the most common enterprise usecase for AI agents is customer support. Let me share a real-world scenario I encountered while building a customer service automation system:

The Rule-Based Approach: Initially, everyone built an extensive decision tree with hundreds of rules to handle customer queries. It worked for common cases but broke down with slight variations. Maintenance became a nightmare.
The Limited-Compute Agent: Next, with the dawn of ChatGPT, there were AI powered customer agents with modest computing resources. You could write prompts based on patterns you saw in historical data or SOP guidelines. Worked well on simple enough questions, but struggled with complex queries and needed constant human oversight.

Many AI agents are here at this point. One path is to constrain it even further, branch out, bring in different frameworks and guardrails, so that the agent sticks to the goal. Inadventently, the compute is somehow fixed. Or you could try:
The Scale-Out Solution: Then we tried something different - what if we threw more compute at it? Not just bigger GPUs, but fundamentally rethinking how we use AI. We had the agent generate multiple responses in parallel, run several reasoning paths simultaneously, and pick the best outcomes. Each customer interaction could spawn dozens of AI calls exploring different approaches. The system would generate multiple potential responses, evaluate them, and even simulate how the conversation might unfold. Sure, it was computationally expensive - but it worked surprisingly well. The system started handling edge cases we hadn’t even thought of, and more importantly, it discovered interaction patterns that emerged naturally from having the freedom to explore multiple paths.

which brings us to:

The RL Revolution

In 2025, this pattern becomes even more evident with Reinforcement Learning agents. While many companies are focused on building wrappers around generic models, essentially constraining the model to follow specific workflow paths, the real breakthrough would come from companies investing in post-training RL compute. These RL-enhanced models wouldn’t just follow predefined patterns; they are discovering entirely new ways to solve problems. Take OpenAI’s Deep Research or Claude’s computer-use capabilities - they demonstrate how investing in compute-heavy post-training processes yields better results than intricate orchestration layers. It’s not that the wrappers are wrong; they just know one way to solve the problem. RL agents, with their freedom to explore and massive compute resources, found better ways we hadn’t even considered.

The beauty of RL agents lies in how naturally they learn. Imagine teaching someone to ride a bike - you wouldn’t give them a 50-page manual on the physics of cycling. Instead, they try, fall, adjust, and eventually master it. RL agents work similarly but at massive scale. They attempt thousands of approaches to solve a problem, receiving feedback on what worked and what didn’t. Each success strengthens certain neural pathways, each failure helps avoid dead ends.

For instance, in customer service, an RL agent might discover that sometimes asking a clarifying question early in the conversation, even when seemingly obvious, leads to much better resolution rates. This isn’t something we would typically program into a wrapper, but the agent found this pattern through extensive trial and error. The key is having enough computational power to run these experiments and learn from them.

What makes this approach powerful is that the agent isn’t limited by our preconceptions. While wrapper solutions essentially codify our current best practices, RL agents can discover entirely new best practices. They might find that combining seemingly unrelated approaches works better than our logical, step-by-step solutions. This is the bitter lesson in action - given enough compute power, learning through exploration beats hand-crafted rules every time.

Indeed, you see this play out in –soon to be big– competition between Claude code and Cursor. Currently users say Cursor does not work well with Claude Sonnet 3.7, but it works flawlessly with Sonnet 3.5. On the other hand, people complain that Claude code (which uses Sonnet 3.7 under the hood) consumes a lot of tokens. However, it works amazingly well. Cursor, reportedly will launch as version with usage based pricing which will make more use of 3.7’s agentic behavior^{. We will see this in more domains, especially outside of code where the an agent could think of multiple approaches, while humans have codified a single workflow.}

What this means for AI Engineers

This insight fundamentally changes how we should approach AI system design:

Start Simple, Scale Big: Begin with the simplest possible learning architecture that can capture the essence of your problem. Then scale it up with compute rather than adding complexity.
Design for Scale: Build systems that can effectively utilize additional compute. This means:
- Parallelizable architectures
- Flexible learning frameworks that can grow with more data and compute
- Infrastructure that can handle distributed processing
Avoid Premature Optimization: Don’t spend weeks optimizing algorithms before you’ve maxed out your compute potential. The returns from clever engineering often pale in comparison to simply adding more computational resources.

The Real “So What”

The implications are profound and somewhat uncomfortable for us engineers:

Investment Strategy: Organizations should invest more in computing infrastructure than in complex algorithmic development.
Competitive Advantage: The winners in AI won’t be those with the cleverest algorithms, but those who can effectively harness the most compute power.
Career Focus: As AI engineers, our value lies not in crafting perfect algorithms but in building systems that can effectively leverage massive computational resources. That is a fundamental shift in mental models of how to build software.

Looking Forward

This lesson might seem to diminish the role of the AI engineer, but it actually elevates it. Our job is to:

Design systems that can effectively utilize increasing compute resources
Build robust learning environments that scale
Create architectures that can grow without requiring fundamental redesigns

The future belongs to those who can build systems that learn and adapt through computational force, not those who try to encode human knowledge into rigid rules.

Remember: In the race between clever engineering and raw compute, compute wins. Our role is to build the race track, not to design the runner’s every move.