头在泽德云中
Head in the Zed Cloud

原始链接: https://maxdeviant.com/posts/2025/head-in-the-zed-cloud/

## Zed Cloud:可扩展性和测试的后端重建 过去五个月,Zed团队一直在重建其核心后端基础设施“Collab”,以支持平台不断增长的用户群。新的系统,**Zed Cloud**,使用Rust构建,并利用**Cloudflare Workers**和**WebAssembly (Wasm)**来最小化运营开销并最大化可扩展性。 通过利用Cloudflare的托管服务,团队旨在更加专注于Zed的开发,而不是基础设施维护。此次重建的关键组成部分是一个自定义的**Platform trait**和两个实现:用于生产环境的**CloudflarePlatform**和用于全面测试的**SimulatedPlatform**。 这种平台无关的方法允许开发者编写可以在实时环境和高度可控测试场景之间无缝切换的代码,从而实现端到端测试——甚至跨越Zed的UI和后端。团队以涉及Orb webhook摄取的测试用例为例,展示了这种能力。 Zed Cloud是未来功能的基础,例如使用DeltaDB进行协作编码。团队目前正在招聘具有Web API和平台经验的Rust工程师,以贡献其持续开发。

## Zed Cloud 更新:协作与 WebAssembly 焦点 最近的 Hacker News 讨论强调了 Zed 代码编辑器的一些有趣进展。虽然 Zed 的 AI 功能可用,但用户对它的协作工具,特别是多人调试器,表现出更大的热情。他们希望这种协作能够扩展到 Zed 之外,在不同的编辑器之间工作。 Zed 基于 WebAssembly (WASM) 构建,具体来说是 WASM3,旨在实现可移植的“适用于各地的汇编”。这使得跨平台命令行工具成为可能,并为未来的兼容性奠定基础。讨论围绕 WASM 的性能展开——与本地编译相比 10-20% 的速度降低,对于许多用例来说是可以接受的。 后端使用 Cloudflare Workers,提供易于部署和扩展的优势,但存在对供应商锁定(Cloudflare 的运行时是开源的)的担忧。Supabase Edge Functions(同样是开源的)和 AWS Lambda 与 Rust 也是讨论的主题,考虑了性能和成本。最终,Zed 优先考虑便利性和可移植性,使用 WASM 为 DeltaDB 驱动的协作编码功能奠定基础。
相关文章

原文

For the past five months I've been leading the efforts to rebuild Zed's cloud infrastructure.

Our current backend—known as Collab—has been chugging along since basically the beginning of the company. We use Collab every day to work together on Zed in Zed. However, as Zed continues to grow and attracts more users, we knew that we needed a full reboot of our backend infrastructure to set us up for success for our future endeavors.

Enter Zed Cloud.

Like Zed itself, Zed Cloud is built in Rust.

This time around there is a slight twist: all of this is running on Cloudflare Workers, with our Rust code being compiled down to WebAssembly (Wasm).

Why Cloudflare Workers?

One of our goals with this rebuild was to reduce the amount of operational effort it takes to maintain our hosted services, so that we can focus more of our time and energy on building Zed itself.

Cloudflare Workers allow us to easily scale up to meet demand without having to fuss over it too much.

Additionally, Cloudflare offers an ever-growing amount of managed services that cover anything you might need for a production web service. Here are some of the Cloudflare services we're using today:

The Platform

Another one of our goals with this rebuild was to build a platform that was easy to test. To achieve this, we built our own platform framework on top of the Cloudflare Workers runtime APIs.

At the heart of this framework is the Platform trait:

pub trait Platform: Sized + Clone + 'static {
    type Cache: cache::Cache;
    type Clock: Clock;
    type KvStore: KvStore<Self>;
    type ServiceBinding: Fetcher<Self>;
    type DurableObjectNamespace: durable_object::DurableObjectNamespace<Self>;
    type DurableObjectStub: durable_object::DurableObjectStub<Self>;
    type DurableObjectState: durable_object::DurableObjectState<Self>;
    type RateLimiter: rate_limiter::RateLimiter<Self>;
    type SqlStorage: sql::SqlStorage;
    type PostgresConnection<T>: postgres::PostgresConnection<Self, T>;
    type PostgresTransaction<T>: postgres::PostgresTransaction<Self, T>;
    type ExecutionContext: ExecutionContext + Clone + Unpin;
    type Environment: Environment<Self> + Clone + Unpin;
    type ClientWebSocket: websocket::ClientWebSocket;
    type ServerWebSocket: websocket::ServerWebSocket<Self>;
    type WebSocketReceiver: websocket::WebSocketReceiver;
    type WebSocketSender: websocket::WebSocketSender;
    type HttpClient: http_client::HttpClient<Platform = Self>;
    type Queue<T: Serialize + 'static>: queue::Queue<Self, T>;
    type RawQueueMessageBatch: queue::RawQueueMessageBatch<Self>;
    type QueueMessageBatch<T: DeserializeOwned + 'static>: queue::QueueMessageBatch<Self, T>;
    type QueueMessage<T: DeserializeOwned + 'static>: queue::QueueMessage<Self, T>;
    type Rng: Clone + RngCore;

    fn websocket_pair() -> Result<(Self::ClientWebSocket, Self::ServerWebSocket)>;
}

This trait allows us to write our code in a platform-agnostic way while still leveraging all of the functionality that Cloudflare Workers has to offer. Each one of these associated types corresponds to some aspect of the platform that we'll want to have control over in a test environment.

For instance, if we have a service that needs to interact with the system clock and a Workers KV store, we would define it like this:

pub struct BillingService<P: Platform> {
    clock: P::Clock,
    kv_store: P::KvStore,
}

Two platforms, both alike in dignity

There are two implementors of the Platform trait: CloudflarePlatform and SimulatedPlatform.

CloudflarePlatform

CloudflarePlatform—as the name might suggest—is an implementation of the platform on top of the Cloudflare Workers runtime. This implementation targets Wasm and is what we run when developing locally (using Wrangler) and in production.

We have a cloudflare_bindings crate that contains wasm_bindgen bindings to the Cloudflare Workers JS runtime. You can think of CloudflarePlatform as the glue between those bindings and the idiomatic Rust APIs exposed by the Platform trait.

SimulatedPlatform

The SimulatedPlatform is used when running tests, and allows for simulating almost every part of the system in order to effectively test our code.

Here's an example of a test for ingesting a webhook from Orb:

#[test]
fn test_orb_webhook_ingestion() {
    Simulator::once(|simulator| async move {
        let test_ctx = OrbWebhooksTestContext::init(&simulator).await?;

        // Some more test setup...

        let request = make_orb_webhook_request(
            HANDLE_ORB_WEBHOOK_URL,
            &webhook_event,
            "2025-09-10T18:16:06.483Z".parse().unwrap(),
            &test_ctx.config.orb_webhook_signing_secret,
        )?;

        let response = test_ctx.worker.fetch(request).await?;
        assert_eq!(response.status, StatusCode::OK);

        simulator.scheduler.run()?;

        let updated_billing_subscription = BillingSubscriptionRepository
            .find(&test_ctx.app_database, billing_subscription.id)
            .await?;
        assert_eq!(
            updated_billing_subscription.kind,
            Some(app_database::SubscriptionKind::TokenBasedZedPro)
        );
        assert_eq!(
            updated_billing_subscription.orb_subscription_status,
            Some(app_database::OrbSubscriptionStatus::Active)
        );
    })
    .unwrap();
}

In this test we're able to test the full end-to-end flow of:

  1. Receiving and validating an incoming webhook event to our webhook ingestion endpoint
  2. Putting the webhook event into a queue
  3. Consuming the webhook event in a background worker and processing it

The call to simulator.scheduler.run()? advances the test simulator, in this case running the pending queue consumers.

At the center of the SimulatedPlatform is the scheduler, a crate that powers our in-house async runtime. The scheduler is shared between GPUI—Zed's UI framework—and the Simulator used in tests.

This shared scheduler enables us to write tests that span the client and the server. So we can have a test that starts in a piece of Zed code, flows through Zed Cloud, and then asserts on the state of something in Zed after it receives the response from the backend.

Where we're headed

The work being done on Zed Cloud now is laying the foundation to support our future work around collaborative coding with DeltaDB.

If you want to work with me on building out Zed Cloud, we are currently hiring for this role.

We're looking for engineers with experience building and maintaining web APIs and platforms, solid web fundamentals, and who are excited about Rust.

If you end up applying, you can mention this blog post in your application.

I look forward to hearing from you!

联系我们 contact @ memedata.com