Zeroserve：一个可使用 eBPF 脚本编写的零配置 Web 服务器

Zeroserve：一个可使用 eBPF 脚本编写的零配置 Web 服务器
Zeroserve: A zero-config web server you can script with eBPF

原始链接: https://su3.io/posts/introducing-zeroserve

**zeroserve** 是一款高性能、零配置的 HTTPS 服务器，旨在替代 Nginx 和 Caddy 等工具。它直接从单个不可变的压缩包（tarball）提供网站服务，并利用 `io_uring` 处理所有 I/O 操作，从而实现卓越的速度和效率。主要创新点包括： * **程序即配置**：用户无需使用复杂的声明式文件，而是将 eBPF 程序嵌入到压缩包中。这些脚本在用户空间沙盒中运行并即时编译（JIT）为原生代码，用于处理包括路由、身份验证、速率限制和反向代理在内的所有请求逻辑。 * **卓越性能**：在单核基准测试中，zeroserve 在提供小型静态文件和处理代理 API 请求方面的表现始终优于 Nginx 和 Caddy。 * **操作简便**：该服务器支持原子化部署——只需替换压缩包并发送 `SIGHUP` 信号，即可在不中断连接的情况下热重载网站、脚本和 TLS 证书。 * **现代安全性**：原生支持 TLS 1.3、加密客户端问候（ECH）以及 JA4 指纹识别。通过将请求处理和配置集成到一个可脚本化的事件循环中，zeroserve 为现代 Web 服务提供了一种统一、易读且高效的解决方案。

```Hacker News 最新 | 过往 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Zeroserve：一个可以用 eBPF 脚本编写的零配置 Web 服务器 (su3.io) 13 点由 losfair 2 小时前发布 | 隐藏 | 过往 | 收藏 | 1 条评论 bflesch 0 分钟前 [–] 看起来不错，功能也很好。但不知为何，我这边就是提不起劲，感觉太“人造”了。我不知道这些指标是不是伪造的，便捷功能是否真的有效，或者是否有经过适当的加固。如果项目是靠“感觉”写出来的，或者 README 是自动生成的，我还可以接受。但连发布公告的博文都是 AI 生成的，我个人完全无法判断你对软件质量的理解是否与我一致。这是一个奇怪的世界，如果这项目是在几年前发布且没有 AI 生成的声明，我肯定会毫不怀疑地全盘接受。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：```

原文

Disclaimer: This article is co-authored with GPT-5.5 and Claude Opus 4.8.

zeroserve is a small, fast, zero-config HTTPS server. You hand it a tarball of a website and it serves it - over HTTP/2 and TLS 1.3, with hot reload and a tiny resident footprint. The twist is that you can drop eBPF programs into the tarball and they run on every request, in userspace, as sandboxed middleware - rewriting, authenticating, and rate-limiting requests, or reverse-proxying them to a backend when you want it to act as a gateway in front of your app.

In short:

Fast: on one core it beats nginx across most workloads - small and large static files, scripted middleware, and small-response proxying, all over HTTPS.
Efficient eBPF scripting: scripts are JIT-compiled to native code and sandboxed in userspace, cheap enough to run on every request.
Program-as-configuration: your eBPF program is the whole configuration, deciding what happens to each request.
io_uring throughout: every network and disk operation is submitted through io_uring.
Modern TLS in the box: TLS 1.3, HTTP/2, Encrypted Client Hello, SNI certificate selection, and JA4 fingerprinting.
Simple to operate: serve a whole site from one tarball and hot-reload it (and the TLS material) with a SIGHUP.

It's meant to be an alternative to nginx and Caddy, and the design bet is about configuration. Those servers give you a declarative config language - location blocks, rewrite rules, map directives, try_files - and then, once the declarative language hits its limits, an optional scripting runtime bolted on the side (Lua, or Caddy's plugins). Behavior ends up split across two layers: directives that quietly grow their own control flow, plus scripts that run somewhere in the request lifecycle you have to keep in your head.

zeroserve collapses that into one thing. There is no config file. The eBPF program is the configuration - a single, ordinary, sandboxed program that sees every request and decides what happens: routing, headers, auth, rate limiting, proxying. I want the whole request path in one program I can read top to bottom.

One tarball, served in place

The whole site is a single tar file. zeroserve indexes it on load - building a path -> byte-range map - and then serves files by issuing byte-range reads against the tarball itself. Nothing is ever unpacked to disk. The site lives entirely in that one file, so there's no document root for a stray location rule to expose, and a deploy is a single atomic file swap. To package a directory:

zeroserve --pack ./public > site.tar
zeroserve --addr 0.0.0.0:8080 site.tar

Deploying a new version is "replace the tarball and send SIGHUP". The reload swaps the site, the scripts, and the TLS material atomically, in the same process, with no dropped connections:

killall -SIGHUP zeroserve

All network and disk I/O goes through io_uring (via the monoio runtime). Each instance is a single-threaded event loop. That sounds like a limitation, and per-process it is - but it's the right shape when your scaling unit is "more processes", and it's why many of them coexist happily on one box.

Scripting with eBPF, in userspace

This is the part I find most fun. Any .c file you put under .zeroserve/scripts/ gets compiled to an eBPF object at pack time (with clang and llc) and runs on every request. The eBPF runs entirely in userspace: zeroserve loads the bytecode into a runtime (async-ebpf) inside its own ordinary, unprivileged process, so the kernel's BPF subsystem and CAP_BPF stay out of it. async-ebpf JIT-compiles the bytecode to native machine code (it vendors uBPF), so your "config" runs as native x86-64.

A pointer cage does the job the kernel verifier normally would, keeping the program from reading or writing memory it shouldn't: every memory access in the JIT-compiled code is masked into the program's own arena, so a stray access stays confined to the script's own memory.

The script runs directly on zeroserve's single event loop. To keep one slow script from stalling every other connection, the runtime is fully preemptible: a timer can interrupt JIT-compiled native code mid-execution and hand control back to the event loop.

The programming model is a chain of scripts, run in sorted filename order, sharing a per-request metadata map. If a script calls zs_respond or zs_reverse_proxy, the chain short-circuits. Here's a script that runs first and enriches every request:

#include <zeroserve.h>

ZS_ENTRY
zs_u64 entry(void) {
  char peer[64];
  if (zs_req_peer(peer, sizeof(peer)) <= 0) zs_strcpy(peer, "unknown");

  // publish values for the HTML template pass
  zs_meta_set(ZS_STR("visitor"), ZS_STR(peer));
  // attach a header to *every* response: static files, zs_respond, proxied
  zs_meta_set(ZS_STR("zs.response.header.x-served-by"), ZS_STR("zeroserve-ebpf"));
  return 0;
}

The metadata it sets does two things. Keys under zs.response.header.* become response headers on everything. And other keys feed a tiny template pass: a <zs-meta>visitor</zs-meta> placeholder in an HTML file gets substituted on the way out. So you get dynamic-ish static pages without a template engine.

The helper surface a script can call is broad:

Request inspection and mutation: read the method, path, query params, headers, and peer address; rewrite the URI or set and remove headers before the response goes out.
Crypto and encoding: SHA-256, HMAC-SHA256, base64, hex, and getrandom.
JSON: parse a request body, build and mutate a document tree, and reply with zs_json_respond.
Rate limiting: per-key token buckets keyed on anything from a peer IP to an API key, with state that survives hot reloads.
AWS SigV4: signed Authorization headers and presigned URLs for talking to S3 and other AWS services.
OIDC login: a complete relying-party flow (Authorization Code + PKCE) that carries the entire login session in sealed XChaCha20-Poly1305 cookies, so you can gate a static site behind "log in with Google" while the server stays stateless.

A dynamic endpoint is just a script that responds:

ZS_ENTRY
zs_u64 entry(void) {
  char path[64];
  zs_req_path(path, sizeof(path));
  if (zs_strcmp(path, "/health") != 0) return 0;

  zs_meta_set(ZS_STR("zs.response.header.content-type"), ZS_STR("application/json"));
  zs_respond(200, ZS_STR("{\"status\":\"ok\"}\n"));
  return 0;
}

Each script runs under a memory-footprint cap (256 KB by default), the runtime time-slices long-running scripts off the executor and throttles the runaways, and scripts can even call each other (zs_call) up to a bounded depth. A script that spins forever stalls only its own request - the preemption timer interrupts it and the server keeps serving everyone else.

The TLS story underneath is more complete than the zero-config framing suggests: TLS 1.3 only, terminated by BoringSSL, with native Encrypted Client Hello (so the real SNI never appears in cleartext), SNI certificate selection from a directory, JA4 client fingerprinting exposed to scripts, and a transparent ECH relay mode that byte-for-byte forwards undecryptable handshakes to a real upstream so a protected name blends in behind a public one. That's a lot of transport security to ship in a single zero-config binary.

How fast is it?

I benchmarked zeroserve against nginx 1.26 and Caddy 2.11 over HTTPS on an 8-core Ryzen 7 3700X, each serving the same content with the same self-signed certificate. Because a zeroserve instance is single-threaded by design, the only fair comparison is per core: I pinned every server to one CPU with taskset (and held nginx to worker_processes 1 and Caddy to GOMAXPROCS=1; zeroserve is single-threaded already) and drove load with wrk -t4 -c100 from other cores, taking the median of three 10-second runs. wrk speaks HTTP/1.1, so these are HTTP/1.1-over-TLS-1.3 numbers with the handshake amortized across long-lived keep-alive connections: the steady-state cost of serving an already-open HTTPS connection.

Small static file (174 B) - the bread and butter of static sites:

server	req/s	p99
zeroserve	36,681	5.4 ms
nginx	31,226	7.8 ms
Caddy	12,830	22 ms

zeroserve serves small files about 17% faster than nginx on a single core, with a tighter tail. HTML pages, small JSON, CSS - this is the case zeroserve is tuned for.

Large static file (100 KB):

server	req/s	throughput	p99
zeroserve	8,000	782 MB/s	22 ms
nginx	7,600	773 MB/s	28 ms
Caddy	6,084	590 MB/s	44 ms

All three are close here, with zeroserve a hair ahead at around 780 MB/s on one core. nginx's usual trump card for large files is sendfile(), which splices file pages from the page cache to the socket with zero userspace copies. Under TLS that path goes unused: the bytes have to be encrypted in userspace anyway (short of kernel TLS, which all three leave off), so every server is bound by the same encrypt-and-write loop, and zeroserve's io_uring read-and-write path is a touch faster at it.

eBPF vs Lua

The obvious comparison for the scripting is nginx + LuaJIT (ngx_http_lua_module), the usual way to run fast code inside a web server. So I wrote the equivalent Lua for two cases and put them head to head.

One tuning knob matters a lot here. zeroserve ships with a conservative default: it arms the script-preemption timer every 2 ms. Fine granularity makes it quick to throttle a misbehaving script, but it taxes every well-behaved one - at the default, eBPF trails nginx Lua on a fully dynamic response (about 32k req/s against 41k). Bumping --preempt-timer-interval-ms to 10 recovers ~40% of scripting throughput and turns that around:

Per-request header-injection middleware (script runs, static file is still served):

engine	req/s	p99
zeroserve eBPF (10 ms)	43,709	5.1 ms
zeroserve eBPF (2 ms default)	31,334	6.7 ms
nginx Lua (`header_filter`)	28,653	8.4 ms

Fully dynamic JSON response:

engine	req/s	p99
zeroserve eBPF (10 ms)	46,945	4.5 ms
nginx Lua (`content_by_lua`)	41,231	6.4 ms
zeroserve eBPF (2 ms default)	32,393	6.7 ms

At the 10 ms interval, tuned eBPF wins both cases. On the middleware case - a script shaping an otherwise-static response - it beats nginx Lua by about 50%, with a tighter tail. On the fully synthetic response it edges nginx's heavily-tuned content_by_lua too (47k against 41k). Both engines compile to native code (LuaJIT is a tracing JIT; async-ebpf JITs the eBPF through uBPF), and with TLS encryption as a shared per-request cost, the tuned eBPF path comes out ahead on throughput. At the 2 ms default, eBPF keeps the middleware win but gives up the synthetic-response lead, so I'd run production scripts at 10 ms.

As a reverse proxy

Serving files is half the job; the other half is proxying to a backend, which is the main reason most people reach for nginx or Caddy in the first place. zeroserve does it from a script - zs_reverse_proxy("http://127.0.0.1:9000") - and keeps a pool of upstream connections (up to 128 per backend, 30 s idle) and reuses them across requests.

Getting a fair fight here takes care: nginx's famous default closes upstream connections after each request, so keep-alive is enabled explicitly (keepalive 128, proxy_http_version 1.1, and a cleared Connection header), with Caddy reusing connections as it does by default. Each proxy terminates TLS on a single core and forwards to a shared plaintext backend, a separate 2-core server that sustains 100k req/s on its own, so the measurement isolates the proxy's own overhead.

Proxying a small (174 B) response:

proxy	req/s	p50	p99
zeroserve	26,486	3.3 ms	8 ms
nginx	21,761	4.2 ms	10.5 ms
Caddy	7,683	10.3 ms	33 ms

zeroserve's pooled io_uring proxy leads here, about 22% ahead of nginx (26.5k against 21.8k) and roughly 3.4× Caddy. For the typical proxy workload - forwarding API calls, small JSON, an app server's HTML - zeroserve terminates TLS and shuttles the request to the backend faster than the reference implementation.

Large bodies tip the balance back. Proxying a 100 KB response:

proxy	req/s	throughput
nginx	5,882	585 MB/s
Caddy	4,285	406 MB/s
zeroserve	3,631	359 MB/s

Once the proxied body is large, nginx's buffering moves bytes more efficiently and pulls ahead, with Caddy slotting in between and zeroserve trailing. If your proxied responses are large, nginx is the better tool; if they're small and numerous, zeroserve is faster.

Memory

Idle, a single zeroserve instance sits around 15 MB PSS - more than nginx's ~6 MB, less than Caddy's ~60 MB. On its own that's unremarkable. What makes it matter is that the unit is a whole process: when you run a copy per core, they all map the same binary, so the code pages are shared, and each extra process adds little beyond its own working set.

zeroserve is open source on GitHub - try it yourself!