荒谬的工作流：仅使用Postgres实现持久执行

荒谬的工作流：仅使用Postgres实现持久执行
Absurd Workflows: Durable Execution with Just Postgres

原始链接: https://lucumr.pocoo.org/2025/11/3/absurd-workflows/

## 荒诞：仅使用Postgres构建持久化工作流为了应对代理和持久化执行系统日益增长的复杂性——通常需要第三方服务——作者开发了**荒诞 (Absurd)**，一个轻量级的、仅使用SQL的库，用于直接在Postgres之上构建可靠的工作流。持久化执行通过结合队列（使用Postgres的`SELECT ... FOR UPDATE SKIP LOCKED`，例如`pgmq`）和状态存储（Postgres本身）来确保长时间运行的任务在失败时能够存活。荒诞通过将SDK的复杂性*转移到*数据库中，通过单个SQL文件简化了这一过程，让开发者专注于逻辑。任务被分解为顺序步骤，每个步骤的结果都会在Postgres中进行检查点记录。这允许在崩溃后自动恢复和继续，避免重复工作。荒诞还支持暂停任务以等待事件或计划时间（休眠）。该库尤其适用于AI代理，可以将AI代理视为迭代定义其路径的工作流。荒诞通过自动步骤计数和检查点记录来实现这一点，允许代理在中断后恢复进度。最终，荒诞证明了强大的持久化执行并不*总是*需要复杂的基础设施，为许多用例提供了一个更简单、可自托管的解决方案。

## Absurd：使用Postgres实现持久执行这次Hacker News讨论围绕着“Absurd”项目，它仅使用Postgres提供持久执行的流程。创建者Armin (mitsuhiko) 追求简单性，认为现有的解决方案，如Temporal和DBOS，对于小型项目过于复杂。Absurd专注于轻量级——一个SQL文件和一个SDK——从而实现自托管和潜在的蜂窝架构。对话突显了对持久执行日益增长的需求，尤其是在智能体兴起的情况下。虽然DBOS被提及为一种替代方案，但一些用户最初觉得它过于复杂，并指出SDK依赖和全局状态管理方面的问题。另一些人则看到了DBOS的潜力，其CEO也提供了帮助。许多评论者讨论了Postgres在此方面的优势，并指出其可靠性和诸如`SKIP LOCKED`之类的功能。该项目旨在提供重试和幂等性，并讨论了如何有效地处理这些方面，尤其是在概率智能体的情况下。最终，Absurd提供了一种潜在的更简单的方法，可以在熟悉的Postgres环境中构建健壮、持久的流程。

原文

written on November 03, 2025

It’s probably no surprise to you that we’re building agents somewhere. Everybody does it. Building a good agent, however, brings back some of the historic challenges involving durable execution.

Entirely unsurprisingly, a lot of people are now building durable execution systems. Many of these, however, are incredibly complex and require you to sign up for another third-party service. I generally try to avoid bringing in extra complexity if I can avoid it, so I wanted to see how far I can go with just Postgres. To this end, I wrote Absurd , a tiny SQL-only library with a very thin SDK to enable durable workflows on top of just Postgres — no extension needed.

Durable Execution 101

Durable execution (or durable workflows) is a way to run long-lived, reliable functions that can survive crashes, restarts, and network failures without losing state or duplicating work. Durable execution can be thought of as the combination of a queue system and a state store that remembers the most recently seen execution state.

Because Postgres is excellent at queues thanks to SELECT ... FOR UPDATE SKIP LOCKED, you can use it for the queue (e.g., with pgmq). And because it’s a database, you can also use it to store the state.

The state is important. With durable execution, instead of running your logic in memory, the goal is to decompose a task into smaller pieces (step functions) and record every step and decision. When the process stops (whether it fails, intentionally suspends, or a machine dies) the engine can replay those events to restore the exact state and continue where it left off, as if nothing happened.

Absurd At A High Level

Absurd at the core is a single .sql file (absurd.sql) which needs to be applied to a database of your choice. That SQL file’s goal is to move the complexity of SDKs into the database. SDKs then make the system convenient by abstracting the low-level operations in a way that leverages the ergonomics of the language you are working with.

The system is very simple: A task dispatches onto a given queue from where a worker picks it up to work on. Tasks are subdivided into steps, which are executed in sequence by the worker. Tasks can be suspended or fail, and when that happens, they execute again (a run). The result of a step is stored in the database (a checkpoint). To avoid repeating work, checkpoints are automatically loaded from the state storage in Postgres again.

Additionally, tasks can sleep or suspend for events and wait until they are emitted. Events are cached, which means they are race-free.

With Agents

What is the relationship of agents with workflows? Normally, workflows are DAGs defined by a human ahead of time. AI agents, on the other hand, define their own adventure as they go. That means they are basically a workflow with mostly a single step that iterates over changing state until it determines that it has completed. Absurd enables this by automatically counting up steps if they are repeated:

absurd.registerTask({name: "my-agent"}, async (params, ctx) => {
  let messages = [{role: "user", content: params.prompt}];
  let step = 0;
  while (step++ < 20) {
    const { newMessages, finishReason } = await ctx.step("iteration", async () => {
      return await singleStep(messages);
    });
    messages.push(...newMessages);
    if (finishReason !== "tool-calls") {
      break;
    }
  }
});

This defines a single task named my-agent, and it has just a single step. The return value is the changed state, but the current state is passed in as an argument. Every time the step function is executed, the data is looked up first from the checkpoint store. The first checkpoint will be iteration, the second iteration#2, iteration#3, etc. Each state only stores the new messages it generated, not the entire message history.

If a step fails, the task fails and will be retried. And because of checkpoint storage, if you crash in step 5, the first 4 steps will be loaded automatically from the store. Steps are never retried, only tasks.

How do you kick it off? Simply enqueue it:

await absurd.spawn("my-agent", {
  prompt: "What's the weather like in Boston?"
}, {
  maxAttempts: 3,
});

And if you are curious, this is an example implementation of the singleStep function used above:

Single step function

async function singleStep(messages) {
  const result = await generateText({
    model: anthropic("claude-haiku-4-5"),
    system: "You are a helpful agent",
    messages,
    tools: {
      getWeather: { /* tool definition here */ }
    },
  });

  const newMessages = (await result.response).messages;
  const finishReason = await result.finishReason;

  if (finishReason === "tool-calls") {
    const toolResults = [];
    for (const toolCall of result.toolCalls) {
      /* handle tool calls here */
      if (toolCall.toolName === "getWeather") {
        const toolOutput = await getWeather(toolCall.input);
        toolResults.push({
          toolName: toolCall.toolName,
          toolCallId: toolCall.toolCallId,
          type: "tool-result",
          output: {type: "text", value: toolOutput},
        });
      }
    }
    newMessages.push({
      role: "tool",
      content: toolResults
    });
  }

  return { newMessages, finishReason };
}

Events and Sleeps

And like Temporal and other solutions, you can yield if you want. If you want to come back to a problem in 7 days, you can do so:

await ctx.sleep(60 * 60 * 24 * 7); // sleep for 7 days

Or if you want to wait for an event:

const eventName = `email-confirmation-${userId}`;
try {
  const payload = await ctx.waitForEvent(eventName, {timeout: 60 * 5});
  // handle event payload
} catch (e) {
  if (e instanceof TimeoutError) {
    // handle timeout
  } else {
    throw e;
  }
}

Which someone else can emit:

const eventName = `email-confirmation-${userId}`;
await absurd.emitEvent(eventName, { confirmedAt: new Date().toISOString() });

That’s it!

Really, that’s it. There is really not much to it. It’s just a queue and a state store — that’s all you need. There is no compiler plugin and no separate service or whole runtime integration. Just Postgres. That’s not to throw shade on these other solutions; they are great. But not every problem necessarily needs to scale to that level of complexity, and you can get quite far with much less. Particularly if you want to build software that other people should be able to self-host, that might be quite appealing.

This entry was tagged ai and announcements

copy as / view markdown