消除 AWS Lambda 的 JavaScript 冷启动

消除 AWS Lambda 的 JavaScript 冷启动
Eliminating JavaScript cold starts on AWS Lambda

Porffor 是一个新型的 JavaScript 引擎/运行时，它将 JS 代码提前编译为 WebAssembly 和原生二进制文件，与 Node.js 和 Bun 等运行时相比，生成的执行文件更小、更快。与将运行时*与* JS 捆绑在一起的替代方案不同，Porffor 旨在采用类似于 C++ 或 Rust 的真正编译方法。基准测试表明，Porffor 二进制文件小于 1MB，执行时间为毫秒级，并且在简单的“hello world”示例中，速度比 Bun 快 25 倍，比 Deno 快 59 倍。最近，Porffor 成功部署在 AWS Lambda 上，展示了令人印象深刻的冷启动性能。它比 Node.js 快约 12 倍，比 Amazon 的 LLRT 快约 4 倍，*并且* 即使考虑到 Node.js 的托管运行时优势，成本也比 Node.js 低 2 倍以上。然而，Porffor 仍处于早期开发阶段（预 Alpha），JS 支持有限（目前完成度超过 60%），并且缺乏强大的 I/O 或 Node.js 兼容性。尽管存在这些限制，作者正在寻找合作者，他们拥有小型、无 Node-API 的 Lambda 函数，以探索潜在的好处。完整的基准测试数据可在 GitHub 上找到。

## Porffor：消除 AWS Lambda 的 JavaScript 冷启动一个名为 Porffor 的新项目旨在通过将 JavaScript/TypeScript 应用程序编译为原生代码（通过 WebAssembly (WASM)）来大幅缩短 AWS Lambda 的冷启动时间。初步基准测试显示出令人鼓舞的结果，中位性能达到 16 毫秒，明显快于典型的 Node.js Lambda 函数。该方法通过潜在地利用诸如每个请求进程fork或arena分配等技术，避免垃圾回收器 (GC)，并利用 Lambda 函数的短生命周期。它还通过将受信任的代码直接编译为原生代码，解锁了以前受 JavaScript 运行时环境限制的可能性，例如传统的线程和低级内存访问。尽管该项目仍处于早期开发阶段，并且缺乏完整的 I/O 或 Node.js 兼容性，但它验证了核心思想并展示了潜力。讨论强调了维护与现有 JavaScript 语义兼容性的挑战以及对稳健测试的需求。Lambda 上 Rust 和 Java 等替代方案以及 AWS 的 LLRT 也被讨论，以及在生产环境中考虑初始冷启动时间以外的因素的重要性。最终，Porffor 代表了优化无服务器架构中 JavaScript 性能的一个令人兴奋的步骤。

原文

How? Enter Porffor

Porffor is my JS engine/runtime that compiles JavaScript ahead-of-time to WebAssembly and native binaries. What does that actually mean? You can compile JS files to tiny (<1MB), fast (millisecond-level) binaries:

~$ bat hi.js
─────┬──────────────────────────────────────
   1 │ console.log("hello blog!")
─────┴──────────────────────────────────────
~$ porf native hi.js hi
[271ms] compiled hi.js -> hi (12.9KB)
~$ du -h hi
16K     hi
~$ ./hi
hello blog!

Node and Bun offer “compile” options, but they bundle their runtime with your JS rather than actually compiling it as if it was C++ or Rust. Porffor does that, allowing for much smaller and faster binaries:

~$ deno compile -o hi_deno hi.js
~$ bun build --compile --outfile=hi_bun hi.js
~$ du -h hi*
16K     hi
97M     hi_bun
82M     hi_deno
4.0K    hi.js
~$ hyperfine -N "./hi" "./hi_deno" "./hi_bun" --warmup 5
Benchmark 1: ./hi
  Time (mean ± σ):     631.4 µs ± 128.5 µs    [User: 294.5 µs, System: 253.1 µs]
  Range (min … max):   465.3 µs … 1701.3 µs    2762 runs

Benchmark 2: ./hi_deno
  Time (mean ± σ):      37.4 ms ±   1.7 ms    [User: 22.5 ms, System: 16.0 ms]
  Range (min … max):    33.8 ms …  42.2 ms    74 runs

Benchmark 3: ./hi_bun
  Time (mean ± σ):      15.9 ms ±   1.2 ms    [User: 8.7 ms, System: 9.6 ms]
  Range (min … max):    13.7 ms …  19.2 ms    175 runs

Summary
  ./hi ran
   25.24 ± 5.50 times faster than ./hi_bun
   59.30 ± 12.36 times faster than ./hi_deno

What’s the trade-off? You have to re-invent the JS engine (and runtime) so it is still very early: limited JS support (but over 60% there) and currently no good I/O or Node compat (yet). But, we can use these tiny fast native binaries on Lambda!

Lambda

A few days ago I got Porffor running on Lambda, not simulated locally but really on AWS! I wrote a cold start benchmark for Node, LLRT (Amazon’s own experimental JS runtime optimizing cold starts) and Porffor running identical code:

export const handler = async () => {
  return {
    statusCode: 200,
    headers: { "Content-Type": "text/plain" },
    body: "Hello from " + navigator.userAgent + " at " + Date()
  };
};

Since we’re benchmarking cold start, the workload does not matter as we are interested in just how we are running here (for context most Lambdas run for <1s, typically <500ms). I spent over a day just benchmarking and even with my biases, the results surprised me.

Node

Here is Node (managed, nodejs22.x), our main comparison and baseline. Surprisingly alright, but still far from ideal: having your users have to wait up to 0.3s due to a technical limitation out of your control just sucks.

We don’t prioritize memory usage here, as AWS bills based on allocated memory rather than actual usage. In this benchmark, we allocate the minimum (128MB), ensuring it remains below that threshold. I’ll show the cost in GB-seconds, calculated as billed duration (from AWS) multiplied by allocated memory.

Also, Node is a managed runtime, meaning AWS supplies it for you. This significantly aids with initialization duration by allowing for effective caching. Crucially, we are not billed for this init duration, which profoundly impacts cost. (While an AWS blog post indicates that this will change starting August 1st, this data is from August and does not yet reflect such charges. I will update if this changes.)

LLRT

LLRT is ~3x faster than Node here, great! Unfortunately in my testing, it also costs ~1.6x more than Node. This is only due to the managed runtime trick explained before. This should change when they charge for that init or create a managed runtime once LLRT is stable. Overall, ignoring that hitch, much better than Node for this benchmark!

Porffor

Porffor is ~12x faster than Node and almost 4x faster than LLRT in this case. Plus, even with Node’s managed runtime trick, it is over 2x cheaper than Node (and almost 4x cheaper than LLRT). ^🫳🎤 I hope this shows that when Porffor works, it works extremely well: Porffor’s P99 is faster than both LLRT’s and Node’s P50.

Conclusion

You might be expecting me to start shilling for you to plug Porffor into your Lambda instantly&mldr; but unfortunately not. Porffor is still very (pre-alpha) early.

Although, if you/your company have small Lambdas (ideally no Node APIs) and want a free quick look for if Porffor could help you, please email me! Porffor is actively improving and more code is working everyday.

For full transparency: benchmark code, CSV data and graphs are available on GitHub here.