所有你的代理都将异步运行。

所有你的代理都将异步运行。
All your agents are going async

原始链接: https://zknill.io/posts/all-your-agents-are-going-async/

## 向异步AI代理的转变与传输问题 AI代理正在从同步聊天机器人（如ChatGPT）——需要持续的人工交互，演变为在后台*为您*工作的流程——安排任务、通过WhatsApp回复、独立运行。这种转变打破了传统的基于HTTP的通信模式，该模式专为即时请求-响应设计，并且难以处理在较长时间内运行的代理。核心问题：HTTP无法处理那些存活时间超过连接、主动推送更新、适应变化的用户/设备或支持多个协作者的代理。Anthropic的Routines和Cloudflare Agents等现有解决方案解决了*持久状态*（代理数据存储的位置），但仍然依赖于低效的轮询或HTTP请求来获取更新——本质上，就是检查新信息。 OpenClaw展示了一种更好的方法，使用外部聊天平台（如WhatsApp）进行无缝的异步通信，但缺乏企业级基础设施。理想的解决方案需要同时具备持久状态*和*持久传输。像Ably这样的公司正在构建专门为此设计的平台，利用实时消息传递创建持久的“会话”，代理和人类可以在其中连接/断开连接，而不会丢失上下文或数据——这才是异步AI的真正基础。

最近的 Hacker News 讨论围绕一篇名为“All your agents are going async” (zknill.io) 的文章展开。文章探讨了人工智能代理中异步功能日益普及，以及这给使用标准 HTTP Server-Sent Events (SSE) 传递响应令牌带来的挑战。一位评论者质疑文章的解决方案，认为它似乎增加了现有重复工作，而不是解决跨兼容性的根本问题。他们建议需要标准化协议，并提到 atproto 作为潜在的传输和存储解决方案。文章作者澄清误解，表示这不是一次推销，尽管他供职于一家拥有相关产品的发布/订阅公司。他强调文章的重点是异步代理的好处，以及 SSE 在这种情况下存在的困难。

原文

Agents used to be a thing you talked to synchronously. Now they’re a thing that runs in the background while you work. When you make that change, the transport breaks.

For most of the time LLMs have been around, you use them by opening a chat-style window and typing a prompt. The LLM streams the response back token-by-token. It’s how ChatGPT, claude.ai, and Claude Code work. It’s also how the demos work for basically every AI SDK or AI Library. It’s easy to think that LLM chatbots are the ‘art of the possible’ for AI right now. But that’s not the case.

Standard design around single http request

Instead, all your agents are going async. Agents are getting crons, webhook support, whatsapp integrations, ‘remote control’ from your phone, scheduled tasks and routines. Agents are becoming something that runs in the background, working while you work, and reporting back results async. Agents are getting workflows in Temporal, Vercel WDK, Relay.app, etc. A human sitting at a terminal or webchat is just one mode now, and increasingly it’s not the interesting one. The interesting thing is what agents can do while not being synchronously supervised by a human.

The problem is that chatbots are primarily built on HTTP. An HTTP request with the prompt, and a SSE stream of LLM generated tokens back on the HTTP response. But this doesn’t work when the agent is running async. There’s no HTTP connection to stream the response back.

OpenClaw’s async step

OpenClaw took a big step towards async agents, by showing people that an agent could live in your WhatsApp chat. The agent could travel around with you, and could work on stuff in the background. OpenClaw showed that you didn’t have to be glued to your browser or terminal to get AI to do work for you.

Openclaw channels model

Anthropic’s direct response to the OpenClaw model is Channels, which is MCP based and allows you to push messages async from an external chat system into a Claude Code session. But they also have /loop and /schedule slash commands, as well as Routines, both allowing you to schedule and run agents in the background. Anthropic also has Remote Control, which lets you continue a Claude Code session from your phone or another browser.

ChatGPT has scheduled tasks which trigger agents async, that can reach out to you if needed.

Cursor has background agents that run in the background in the cloud.

All of these features are about breaking the coupling between a human sitting at a terminal or chat window and interacting turn-by-turn with the agent. They make interactions with agents continuous, remote, long-running, and async.

The transport mismatch

All these new async features share the same property; the lifetime of an agent’s work is decoupled from the lifetime of a single HTTP connection. In chatbot demo apps, the agent is only processing while the HTTP connection is open. The LLM is doing inference in response to an HTTP request and streaming the tokens back on the HTTP response as an SSE stream. I’ve said before that a chatbot’s worst enemy is page refresh, and this is entirely because of the transport mismatch. HTTP request-response can’t survive a page refresh, and it can’t serve async agents.

There are four scenarios that the old transport based on HTTP can’t handle cleanly:

Agent outlives the caller: A routine fires from a cron, or the agent takes a long time to do its work. Five minutes later the agent has a result, but no one’s listening anymore. Where do the results go? Right now, they go in a database and you have to poll for them with some session URL (which, y’know, sucks).
Agent wants to push unprompted: The agent finished a nightly backlog review and has three PRs for you to review. Or your async workflow hits a human-approval step and needs you to say yes before it can keep going. There’s no connection back to you. Right now, they email you or send a slack message.
Caller changes: You started a task at your desk, went to lunch, and want to check on it from your phone. Anthropic’s Remote Control handles this, but only by building custom backend session storage and management. It’s not a first-class feature of the HTTP transport.
Multiple humans in one session: You have a team of five people working on a task together, and an agent is helping you. The agent needs to be able to push updates to all five of you, and take input from any of you.

Part of the reason that folks found OpenClaw so awesome is that it handles all of these scenarios for you. OpenClaw’s model separates the lifetime of the agent’s work from the lifetime of the connection to the human. The agent can do work async, and then use WhatsApp, iMessage, Telegram, Discord or whatever async chat system to push the results back to you when it’s done.

This just isn’t possible with HTTP request-response.

So how are folks solving this?

Looking across the industry, there are a bunch of different solutions to this. Clearly there’s the OpenClaw model where all the interaction is through some external chat provider. This chat provider also provides the conversation history to the agent, so the agent can have context on the conversation even after restarts. But this is just an extension on the chat-based model. I don’t think it’s the most interesting solution.

Most folks are pulling more and more of the session state into a centralised and hosted environment. Anthropic is doing this with Routines and Remote Control. More of the session state, conversation history, and agent inference is running in the hosted Anthropic platform. They are consolidating more of the agent lifecycle and agent connection state into their own platform, rather than just being an LLM inference API.

Cloudflare are getting involved too with their own Agents platform built on their workers platform. The Cloudflare Sessions API provides the session and conversation storage for agents, accessible over HTTP. And to fix the async notification problem, Cloudflare has launched their Email for Agents product.

These solutions solve only one half

The problem actually splits into two halves. The first half is durable state. Where does the agent’s state live, how does the agent have access to that state on restart or when processing async tasks, and where does the agent store its output? The second half is durable transport. How do the bytes of the response get between the agent and the humans or other agents, how does the connection survive disconnect, device switch, fan-out, server-initiated push, etc?

The Anthropic and Cloudflare solutions are really focused on the first half of the problem. They are building durable state storage and management for agents. Their solution to getting the bytes of the response is still polling, or HTTP requests. Cloudflare does have websocket support, but it doesn’t survive disconnections for streaming LLM responses. Anthropic and Cloudflare’s solution is based on the idea that if they store all the data required, then clients can always HTTP GET that data later. It half works, but it’s not ‘art of the possible’.

Durable transport, durable state

Right now, the session and the transport are all wrapped up in a single HTTP request-response. Cloudflare and Anthropic’s hosted features go some way to making the session state durable, but the don’t fix the transport problem. You’re still stuck with HTTP gets, or polling, in order to find out something new has happened.

Looking at the OpenClaw model, where the conversation history is in the chat channel and the agent process and LLM provider are both separated from that, you can’t build the same design on Cloudflare or Anthropic. There’s no ’enterprise’ version of the OpenClaw channels model that you can run with your own infrastructure. There’s no durable transport and durable state solution.

I work for Ably and we are currently building a durable transport for AI agents built around the idea of a session. We’re building on top of our existing realtime messaging platform. It’s frustrating to see folks fighting the similar problems over and over again, all because they picked the wrong transport to start with: HTTP request-response.

Ably’s transport

A ‘session’ with an AI should be a thing that humans and agents can connect to, and disconnect from at any time. They should be able to come and go, the session should survive wifi issues, or your phone disconnecting when you go into a train tunnel without the human or the agent needing to care about it. This is what you can do with OpenClaw, and with modern async agentic applications. The conversation state should be accessible through the durable session, and humans and agents should be able to notify each other through the session. The session should be a first class primitive for building async agents.

Because we’re building on our existing realtime messaging platform, we’re approaching the same problem that Cloudflare and Anthropic are approaching, but we’ve already got a bi-directional, durable, realtime messaging transport, which already supports multi-device and multi-user. We’re building session state and conversation history onto that existing platform to solve both halves of the problem; durable transport and durable state.