Sandboxfs怎么了？

Sandboxfs怎么了？
Whatever Happened to Sandboxfs?

原始链接: https://blogsystem5.substack.com/p/whatever-happened-to-sandboxfs

2017年到2020年期间，我在谷歌开发了sandboxfs，这是一个用户空间文件系统，旨在提升Bazel在macOS上的沙盒性能。Bazel使用“符号链接森林”来为构建操作创建隔离环境（execroots），但在macOS上由于大量的系统调用而速度缓慢。sandboxfs通过创建虚拟文件层次结构，即时构建沙盒，提供了一种更快的替代方案。它并非预先支付成本，而是在I/O操作上产生了开销。初步测试显示了其前景，但并没有完全解决由于编译器状态缓存而导致的构建iOS应用的性能损失问题。沙盒对于交互式构建的价值也值得怀疑。实施方面面临的挑战包括基于Go的FUSE的性能问题和基于JSON的RPC接口。苹果弃用内核扩展以及OSXFUSE闭源进一步加剧了问题，迫使我们使用NFSv4进行了彻底的重写。最终，sandboxfs被放弃了。然而，高效沙盒的核心思想仍然具有现实意义，因为符号链接森林难以随着工具链的增长而扩展。虽然我不再使用macOS或Bazel，但我相信仍然需要类似sandboxfs的解决方案来解决持续存在的沙盒性能问题。

Hacker News 上的一场讨论围绕着沙盒构建系统，尤其是在 macOS 上的挑战，重点是“Sandboxfs”的消亡。讨论涉及 Apple 新的类似 FUSE 的 API（FSKit）及其由于积极的编译器缓存而可能存在的性能限制。一个关键问题是如何隔离会妨碍编译器缓存访问，从而减慢构建速度。有人提出了一些解决方案，例如将 Bazel 与 LLVM 的内容寻址存储 (CAS) 集成，或者使用 Swift 和 Clang 的“显式模块构建”。Landlock LSM（与 GNU Make 一起使用）因其速度而被提及，尽管它需要多个系统调用。讨论还考虑了使用 NFS（如较旧的 Vesta 构建系统）或 ASIF 稀疏映像进行沙盒化的可行性。评论者们争论了 macOS 容器化框架的适用性，它对 Linux 微虚拟机的依赖，以及其他方法，例如使用 `LD_PRELOAD` 拦截 `open` 系统调用。讨论还考虑了静态链接二进制文件（在 Go、Rust、OCaml、Haskell 中很常见）对沙盒化的影响。

原文

Back in 2017–2020, while I was on the Blaze team at Google, I took on a 20% project that turned into a bit of an obsession: sandboxfs. Born out of my work supporting iOS development, it was my attempt to solve a persistent pain point that frustrated both internal teams and external users alike: Bazel’s poor sandboxing performance on macOS.

sandboxfs was a user-space file system designed to efficiently create virtual file hierarchies backed by real files—a faster alternative to the “symlink forests” that Bazel uses to prepare per-action sandboxes. The idea was simple: if we could lower sandbox creation overhead, we could make Bazel’s sandboxing actually usable on macOS.

Unfortunately, things didn’t play out as I dreamed. Today, sandboxfs is effectively abandoned, and macOS sandboxing performance remains an unsolved problem. In this post, I’ll walk you through why I built sandboxfs, what worked, what didn’t, and why—despite its failure—I still think the core idea holds promise.

To understand how sandboxfs was intended to help with sandboxed build performance, we need to first dive into how Bazel runs build actions. For those unfamiliar with Bazel’s terminology, a build action or action is an individual build step, like a single compiler or linker execution.

To run actions, Bazel uses the strategies abstraction to decouple action tracking in the build graph from how those actions are actually executed. The default strategy for local builds is the sandboxed strategy, which isolates the processes that an action runs from the rest of the system. The goal is to make these processes behave in a deterministic manner.

The sandboxed strategy achieves action isolation via two different mechanisms:

The use of kernel-level sandboxing features to restrict what the action can do (limit network access, limit reads and writes to parts of the file system, etc.). One such mechanism is sandbox-exec on macOS.
The creation of an execution root (or execroot) in which the action runs. The execroot contains the minimum set of files required for the action to run: namely, the toolchain and the action inputs (source files, toolchain dependencies, etc.). One way to do this is via symlink forests.

The default mechanism to create an execroot in Bazel is to leverage symlink forests: file hierarchies that use symlinks to refer to files that live elsewhere.

Creating a symlink forest is an operation that scales linearly with the number of files in it, and each symlink creation requires at least two system calls: one to create the symlink and another to delete it when the sandbox is torn down. Plus symlink forests typically have complex directory structures, so there are extra mkdir and rmdir operations to handle all intermediate path components. Doing thousands of these operations may only take milliseconds, but… overheads in action execution quickly compound and turn into visible build slowdowns.

To illustrate what this means in practice, consider this target:

cc_library(
    name = "foo",
    srcs = ["foo.c"],
)

This target makes Bazel spawn one action to compile foo.c into foo.o. Said action needs to: run the compiler; read the foo.c file; and access any system includes that foo.c may reference. Thus, the sandbox used to run this action may look like this:

.../sandbox/external/cc/bin/clang -> /usr/bin/clang
.../sandbox/external/cc/include/stdio.h -> /usr/include/stdio.h
.../sandbox/external/cc/include/stdlib.h -> /usr/include/stdlib.h
.../sandbox/libfoo/foo.c -> /home/jmmv/sample/libfoo/foo.c

Having this symlink forest in place, Bazel would run the equivalent of this command to perform the compilation:

cd .../sandbox && ./external/cc/bin/clang -nostdinc -I./external/cc/include -o libfoo/foo.o -c libfoo/foo.c

When Bazel runs this, it expects that clang will only access files in the external/cc/include directory it previously created inside the sandbox. But because reality may not match expectations, Bazel wraps the command by whatever technology the host OS provides to enforce sandboxing.

Creating symlink forests on an action basis was very expensive on macOS… or so everyone said. When I arrived to the Blaze team, sandboxing had already been disabled by default on macOS builds and the rationale behind that was that “symlinks were too slow”.

There were some flaws with this claim:

It was impossible to prove. I ran many microbenchmarks to exercise symlink creations and deletions in large amounts and could never observe a significant performance degradation compared to Linux.
Building Bazel with itself, with sandboxing enabled, did not show any sort of substantial performance loss. Yet Bazel has relatively large C++ and Java actions in its own build so you would have expected to see something.
If macOS was truly bad at something as fundamental as “symlink management”, you’d imagine that someone else would have found the issue and asked about it online (as it often happens with misguided NTFS complaints). But there were none to be found.

Still, I devised the sandboxfs plan right after developing sourcachefs—another short-lived stint in file systems development—and I charged ahead. I wanted sandboxfs to exist because it did solve an obvious scalability issue (issuing tens of thousands of syscalls per symlink forest creation is not free) and because I wanted sanboxfs to exist for pkg_comp’s own benefit.

sandboxfs replaces symlink forests with a virtual file hierarchy that can be materialized in constant time. Here is the flow of operations:

Bazel generates an in-memory manifest of the execroot structure and which files are backed by which other files.
Bazel sends this manifest to sandboxfs via an RPC (which means we have at least one system call to send a message through a socket and a couple of context switches).
sandboxfs updates its in-memory representation of the file system and exposes a new sandbox at its mount point.
Bazel runs the action in the new sandbox.
sandboxfs catches all I/O in the sandbox and redirects it to the relevant real backing files.

It’s this last point that presents the trade off behind sandboxfs, because sandboxfs doesn’t make all costs magically go away. Instead of paying the cost of setting up the sandbox upfront via many system calls, we pay a different cost over all reads and writes that go through the virtual file system. The original hypothesis was that this would be worth it, because most (but not all) build actions are not I/O bound, and most build actions do not access all the files that are mapped into their sandbox.

Going back to the example from before, Bazel would send an RPC like this to sandboxfs:

{
    "CreateSandbox": {
        Path: "/736",
        Mappings: {
            "external/cc/bin/clang": "/usr/bin/clang",
            "external/cc/include/stdio.h": "/usr/include/stdio.h",
            "external/cc/include/stdlib.h": "/usr/include/stdlib.h",
            "libfoo/foo.c": "/home/jmmv/sample/libfoo/foo.c",
        },
    }
}

And this would cause the following file hierarchy to be immediately available under the mount point:

.../sandboxfs/736/external/cc/bin/clang
.../sandboxfs/736/external/cc/include/stdio.h
.../sandboxfs/736/external/cc/include/stdlib.h
.../sandboxfs/736/libfoo/foo.c

Note that I did not write what these files point to in this snippet because sandboxfs does not use symlinks. sandboxfs exposes the files as if they were real files, and it does that to prevent tools from resolving symlinks and discovering sibling files they aren’t supposed to see. From the point of view of clang when it runs, everything it sees under .../sandboxfs/736/external/cc/include/stdio.h is a copy of whatever is in /usr/include/stdio.h.

Overall, sandboxfs was a fun exercise and a great journey to learn more about Rust, FUSE and file systems, and macOS internals:

I got to learn Rust. I was lucky to find a random coworker at Google that offered to review my code, and his input was an invaluable learning resource for me.
I got to learn about FUSE in quite a bit of detail. I had already played with it before, but by working on sandboxfs, I had to debug some gnarly problems.
I got to experience rewriting pre-existing Go code in Rust (because the original sandboxfs implementation was in Go). This was an enlightening exercise because, as I tried to convert the code “verbatim”, I discovered many subtle concurrency bugs and data races that Rust just didn’t let me write.
The initial performance evaluation of using sandboxfs for real iOS builds showed promise: I observed that a specific iOS app “only” got a 55% performance penalty when using sandboxfs instead of the 270% penalty it got from symlink forests. A good win, but insufficient to justify enabling sandboxing by default.

Many things really. Let’s start with wrong assumptions:

Symlink forest creation may not have been the biggest problem in sandboxing performance. As I mentioned in the opening, microbenchmarking this area of macOS didn’t show obvious slowdowns and building Bazel with itself didn’t show major performance differences with and without sandboxing. But iOS builds suffered massively from sandboxing, and the problem was elsewhere: the Objective C and Swift compilers cache persistent state on disk, and sandboxing was preventing such state from actually being persisted.
The need for sandboxing on interactive builds was questionable. Yes, it’d have been neat to have it, but in practice, the benefits are little: if your CI builds are powered by remote execution, which tends to happen when you use Bazel, then the implicit sandboxing of remote execution gives you almost all protections that you’d get from using sandboxing.

There were also implementation problems:

The original implementation of sandboxfs was written in Go, and I hit performance issues with the way bazil/fuse dealt with FUSE operations.
The previous was fixed by rewriting sandboxfs in Rust, but then I hit performance problems with the JSON-based RPC interface that sandboxfs had grown in a rush. Fixing this properly required a deep redesign to use path compression and to bypass JSON altogether. But I didn’t get to this because…
Kernel bugs / limitations in OSXFUSE erased the possibility of implementing a critical performance optimization.

And then I also hit unexpected changes in the ecosystem:

Apple deprecated kernel extensions, making the use of FUSE really convoluted and its future uncertain. Apple provided alternate APIs to implement file systems in user space, but those were designed for iCloud-style services and were/are not suitable for sandboxfs.
At around the same time in 2019, OSXFUSE went closed source. This meant that relying on it for any future work was not well-advised. There were still code dumps for older versions, but that was not something I was able to maintain.
Because of the previous two, I would have had to expose the sandboxfs virtual file system over NFSv4 instead of FUSE. Buildbarn’s bb-clientd provides a dual FUSE/NFSv4 implementation, which proves that this is technically doable, but adding an NFSv4 frontend to sandboxfs meant having to rewrite it from scratch. Plus I’m not sure we’d have gotten good-enough performance if we went this route.
At that point in mid-2019, given the other problems illustrated above… I had no interest nor time to rewrite sandboxfs “correctly” (remember, this was a 20% project at first, which unsurprisingly turned into an 120% project). It’d have been nice to do though, because “now I know how to do it right”.

I still believe that Bazel needs something like sandboxfs for efficient sandboxed builds. As I mentioned earlier, creating symlink forests does not scale for action execution, and with ever-growing toolchain sizes, the problem is getting worse over time. However, the benefits of local sandboxing are unclear if you are already using remote execution.

That said, people keep complaining about poor Bazel sandboxing performance on macOS, which means there still is a clear user need to make this better. And I’m not convinced the various “workarounds” that have been tried in this area (like reusing sandboxes) are sound designs nor that they can actually deliver on their promise.

In my case… I don’t run Bazel on Mac anymore at work. What’s more: I do not even use a Mac for personal reasons these days, which means my ulterior motive to use sandboxfs in pkg_comp is gone. But! If you wanted to recreate sandboxfs from scratch, let’s talk!

Sandboxfs怎么了？ Whatever Happened to Sandboxfs?

Sandboxfs怎么了？
Whatever Happened to Sandboxfs?