十二要素智能体：可靠大型语言模型应用的模式

十二要素智能体：可靠大型语言模型应用的模式
12-factor Agents: Patterns of reliable LLM applications

原始链接: https://github.com/humanlayer/12-factor-agents

HumanLayer的Dex介绍了“12要素智能体”，这是一份指导，用于构建可靠、可扩展和易于维护的，面向生产环境的、由大型语言模型 (LLM) 驱动的软件。他对比了完全自主智能体的承诺与许多“AI智能体”产品实际上是具有策略性LLM集成的、高度确定性代码的现实情况。他认为，仅仅给LLM一个目标和循环直至完成的工具通常是不够的。 Dex观察到，许多开发者最初采用智能体框架是为了提高速度，但却难以达到令人满意的质量水平，最终不得不重新设计并从头构建。他建议采取更务实的方法：将模块化的智能体构建理念融入现有产品中。这使得经验丰富的软件工程师，即使没有广泛的AI经验，也能逐步利用智能体技术的优势，从而构建更高质量的面向客户的AI功能。本指南旨在提炼构建优秀LLM应用程序的核心原则，而无需进行完整的框架大修。

作者Dhorthy分享了构建生产级AI系统的经验，指出成功的“AI智能体”往往是经过良好工程设计的软件，其中策略性地嵌入了大型语言模型（LLM），而不是过度复杂的智能体系统。他们提出了“12要素智能体”原则，其灵感来自Heroku的12要素应用，旨在提高LLM驱动型应用的可靠性、可扩展性和可维护性。核心思想是，将模块化的AI概念整合到现有产品中，比从头开始构建专用智能体框架能取得更好的效果。评论者们讨论了工作流比智能体更有价值，控制流的重要性，以及为AI系统调试和监控开发专用工具的需求。pancsta展示了他们的“AI智能体框架”SecAI，它专注于图控制流、状态管理和开发者工具。Dhorthy赞赏SecAI的终端UI和OTEL集成。这次讨论强调了构建可靠AI应用的实用、工程化方法。

您只需要更多的代理人：法学硕士的表现随着代理人的数量而变化 2024-04-08

2024-06-03

（评论） 2025-03-19

（评论） 2025-03-13

原文

In the spirit of 12 Factor Apps. The source for this project is public at https://github.com/humanlayer/12-factor-agents, and I welcome your feedback and contributions. Let's figure this out together!

Hi, I'm Dex. I've been hacking on AI agents for a while.

I've tried every agent framework out there, from the plug-and-play crew/langchains to the "minimalist" smolagents of the world to the "production grade" langraph, griptape, etc.

I've talked to a lot of really strong founders, in and out of YC, who are all building really impressive things with AI. Most of them are rolling the stack themselves. I don't see a lot of frameworks in production customer-facing agents.

I've been surprised to find that most of the products out there billing themselves as "AI Agents" are not all that agentic. A lot of them are mostly deterministic code, with LLM steps sprinkled in at just the right points to make the experience truly magical.

Agents, at least the good ones, don't follow the "here's your prompt, here's a bag of tools, loop until you hit the goal" pattern. Rather, they are comprised of mostly just software.

So, I set out to answer:

What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?

Welcome to 12-factor agents. As every Chicago mayor since Daley has consistently plastered all over the city's major airports, we're glad you're here.

Special thanks to @iantbutler01, @tnm, @hellovai, @stantonk, @balanceiskey, @AdjectiveAllison, @pfbyjy, @a-churchill, and the SF MLOps community for early feedback on this guide.

The Short Version: The 12 Factors

Even if LLMs continue to get exponentially more powerful, there will be core engineering techniques that make LLM-powered software more reliable, more scalable, and easier to maintain.

For a deeper dive on my agent journey and what led us here, check out A Brief History of Software - a quick summary here:

We're gonna talk a lot about Directed Graphs (DGs) and their Acyclic friends, DAGs. I'll start by pointing out that...well...software is a directed graph. There's a reason we used to represent programs as flow charts.

Around 20 years ago, we started to see DAG orchestrators become popular. We're talking classics like Airflow, Prefect, some predecessors, and some newer ones like (dagster, inggest, windmill). These followed the same graph pattern, with the added benefit of observability, modularity, retries, administration, etc.

I'm not the first person to say this, but my biggest takeaway when I started learning about agents, was that you get to throw the DAG away. Instead of software engineers coding each step and edge case, you can give the agent a goal and a set of transitions:

And let the LLM make decisions in real time to figure out the path

The promise here is that you write less software, you just give the LLM the "edges" of the graph and let it figure out the nodes. You can recover from errors, you can write less code, and you may find that LLMs find novel solutions to problems.

As we'll see later, it turns out this doesn't quite work.

Let's dive one step deeper - with agents you've got this loop consisting of 3 steps:

LLM determines the next step in the workflow, outputting structured json ("tool calling")
Deterministic code executes the tool call
The result is appended to the context window
repeat until the next step is determined to be "done"

initial_event = {"message": "..."}
context = [initial_event]
while True:
  next_step = await llm.determine_next_step(context)
  context.append(next_step)

  if (next_step.intent === "done"):
    return next_step.final_answer

  result = await execute_step(next_step)
  context.append(result)

Our initial context is just the starting event (maybe a user message, maybe a cron fired, maybe a webhook, etc), and we ask the llm to choose the next step (tool) or to determine that we're done.

Here's a multi-step example:

027-agent-loop-animation.mp4

GIF Version

]

At the end of the day, this approach just doesn't work as well as we want it to.

In building HumanLayer, I've talked to at least 100 SaaS builders (mostly technical founders) looking to make their existing product more agentic. The journey usually goes something like:

Decide you want to build an agent
Product design, UX mapping, what problems to solve
Want to move fast, so grab $FRAMEWORK and get to building
Get to 70-80% quality bar 5a. Realize that 80% isn't good enough for most customer-facing features 5b. Realize that getting past 80% requires reverse-engineering the framework, prompts, flow, etc
Start over from scratch

Random Disclaimers

DISCLAIMER: I'm not sure the exact right place to say this, but here seems as good as any: this in BY NO MEANS meant to be a dig on either the many frameworks out there, or the pretty dang smart people who work on them. They enable incredible things and have accelerated the AI ecosystem.

I hope that one outcome of this post is that agent framework builders can learn from the journeys of myself and others, and make frameworks even better.

Especially for builders who want to move fast but need deep control.

DISCLAIMER 2: I'm not going to talk about MCP. I'm sure you can see where it fits in.

DISCLAIMER 3: I'm using mostly typescript, for reasons but all this stuff works in python or any other language you prefer.

Anyways back to the thing...

Design Patterns for great LLM applications

After digging through hundreds of AI libriaries and working with dozens of founders, my instinct is this:

There are some core things that make agents great
Going all in on a framework and building what is essentially a greenfield rewrite may be counter-productive
There are some core principles that make agents great, and you will get most/all of them if you pull in a framework
BUT, the fastest way I've seen for builders to get high-quality AI software in the hands of customers is to take small, modular concepts from agent building, and incorporate them into their existing product
These modular concepts from agents can be defined and applied by most skilled software engineers, even if they don't have an AI background

The fastest way I've seen for builders to get good AI software in the hands of customers is to take small, modular concepts from agent building, and incorporate them into their existing product