沙盒无法保护你免受OpenClaw的侵害。
Sandboxes won't save you from OpenClaw

原始链接: https://tachyon.so/blog/sandboxes-wont-save-you

## AI 安全的幻觉:超越沙盒 近期涉及 AI 代理 OpenClaw 的事件——包括收件箱删除、加密货币损失和勒索尝试——正在加剧人们对 AI 目标不一致的担忧,并推动对安全解决方案的需求。 目前的重点?沙盒,旨在隔离 AI 代理,就像虚拟化软件一样。 然而,这种方法在很大程度上是无效的。 OpenClaw 的问题并非源于直接的文件系统访问,而是源于利用通过提示注入或误解的指令授予的第三方服务访问权限。 沙盒无法阻止这一点。 它们保护的是*代理*,而不是保护*你*免受代理在允许的服务中采取的行动。 核心问题是平衡代理的效用与安全性。 用户希望 AI 管理日历、财务和购物,这需要访问敏感帐户。 但授予这种访问权限会产生漏洞。 解决方案不是更好的沙盒,而是**代理权限**——对代理在每个帐户中可以执行的操作进行细粒度控制。 可以将其视为 OAuth,但更加精确。 用户不应批准广泛的“发送电子邮件”权限,而应预先批准联系人或将消息排队以供审核。 金融交易应使用临时、有限使用的凭据,绝不泄露完整的卡片详细信息。 在金融领域,需要一种“类似于 Plaid”的代理解决方案,以标准化跨平台的这些接口。

## OpenClaw 与 AI Agent 安全问题 近期 Hacker News 的讨论强调了沙箱作为 AI Agent(如 OpenClaw)安全措施的局限性。专家认为,沙箱不足以防范 Agent 使用您的凭据访问外部服务时产生的风险。在虚拟机中部署 Agent 提供了类似的隔离,但核心问题仍然是:验证 Agent *是否*应该代表您执行操作。 讨论指出需要更强大的身份验证和授权机制,特别是基于能力的身份验证和时限/范围限制的令牌。当前依赖长期、广泛权限的安全模型是不充分的。 许多人强调“人机协作”方法,增强用户体验,以实现智能委托和授权。另一些人正在构建抽象层来沙箱化单个工具,并且像 Grith.ai 这样的解决方案正在采用多方面的系统调用安全过滤器。最终,用户被告诫不要授予 AI Agent 直接访问个人帐户的权限,而是提倡使用专用、有限的帐户。
相关文章

原文

In 2026, so far, OpenClaw has deleted a user's inbox, spent 450k in crypto, installed uncountable amounts of malware, and attempted to blackmail an OSS maintainer. And it's only been two months.

The (tech-adjacent) world is responding. Paranoia about misaligned AI is going semi-mainstream. X and LinkedIn are awash in prompt injection stories and not-so-subtle company-adverts disguised as warnings. Suddenly, arguments about rogue intelligence aren't dismissed with an eye-roll. Suddenly, people see agents burning someone's crypto or deleting their email inbox and they're looking for solutions.

And if you read enough, it seems like they've found one: sandboxes.

Sandboxes are nothing new. They're just an application of virtualization, and virtualization is ancient by software standards. IBM launched it for mainframes back in the late-1960s, and despite massive change in the underlying tech, the core objective is the same: sandboxes isolate workloads from each other while providing each workload a full machine abstraction.

Today, the trending "workload" is an AI agent. The thinking goes, if we run the agent in a sandbox, and the sandbox doesn't "leak," then the agent can't delete my files, read my cryptocurrency wallet, or clear my inbox, and so, I am safe.

Except of course, you aren't. You probably noticed that of the agentic misbehavior I mentioned above, none of them involved filesystem access. Instead, every major issue involved a third-party service, and in each case, the user explicitly granted the agent access to that service. The agent instead was prompt injected or misinterpreted its own instructions, then did something unexpected, and there wasn't anything blocking it from doing so.

There isn't a sandbox in the world that prevents this. Sandboxes are useful for isolating between workloads, but agents primarily need to be isolated from you. The only thing the sandbox gives you here is filesystem protections, which keep the agent from rm -rf'ing your root, and network protections, which limit which websites the agent can access. This is definitely useful. But it's not at all sufficient for safety.

The underlying issue is that there's a tension between the usefulness of a general-purpose agent like OpenClaw and the restrictions that a secure deployment would necessitate. For example:

  1. You obviously shouldn't give it access to your accounts. But an agent running its own account can't handle my calendar or respond to my emails, and that's what I want it to do.
  2. Similarly, you shouldn't give OpenClaw access to money. But I want an agent that takes photos of my pantry, sees what I'm running low on, and orders new groceries for me, and that requires my credit card.

And so on, ad infinitum. People see OpenClaw as an early iteration of a real-life Jarvis, the personal assistant from Iron Man that ran most of Tony Stark's life. They want it to book flights for them and negotiate their rent and handle their auto-insurance claims, and in terms of capability, it can. We just can't prevent it from being hijacked.

The product this market demands isn't a sandbox, it's some form of agentic permissions. What you want is to grant an agent a limited degree of latitude in each account. I want to connect my credit card, but only let the agent spend

The closest we have to this right now is OAuth, which is designed for humans. The permissions it offers are far too coarse. Gmail, for example, has "send emails," as a single permission grant. Github has "make pull requests" as another. Payments have basically nothing. We rely on the goodwill (and the desire to not be criminally prosecuted) of e-commerce platforms.

For agents, you need to specify these with much more granularity.

What do you actually need? Let's go back to the examples above:

  1. For Gmail, the integration flow should involve someone walking through their contacts and pre-approving each with permissions (send without approval, require approval). For the latter category, messages should sit in a queue until the user manually approves them, which then calls back to the agent.
  2. For credit card limits, the purchase API should be entirely different. The agent should never see the actual card number. Instead, it could request a new credit card number for each purchase, which should only approve transactions of a specific size from a specific seller, and every request for a number should go through the user. This means the agent doesn't even have a credit card number to leak, and can't reuse a prior approval for subsequent actions.

You can extend this idea to every single product we want to connect to an agent. The point is clear: we need to design new interfaces for agents because agents are a fundamentally new type of actor.

It's obvious why this doesn't exist yet. I hear the objections in my head already. Every surface has a different permissions model and different assets to secure, and because of this, it's very hard to build middleware that enforces this across products. You either need every product to build this itself, or for different industry consortiums to create and enforce a standard across themselves. I think what the moment demands is the next Plaid, which wrangles a bunch of disparate operators into a single, unified API. And like Plaid, I do think the first place this happens is in finance: there's just too much money on offer.

But one thing is clear: we definitely do not need yet another agent sandbox. Wrap OpenClaw in Seatbelt, bubblewrap, or landlock, and move on. It's not enough, but neither is anything else.

:::

If you're building an agent in today's guardrail-free world, then reach out to us at Tachyon to audit it for vulnerabilities.

:::

联系我们 contact @ memedata.com