Back in 2017–2020, while I was on the Blaze team at Google, I took on a 20% project that turned into a bit of an obsession: sandboxfs. Born out of my work supporting iOS development, it was my attempt to solve a persistent pain point that frustrated both internal teams and external users alike: Bazel’s poor sandboxing performance on macOS.
sandboxfs was a user-space file system designed to efficiently create virtual file hierarchies backed by real files—a faster alternative to the “symlink forests” that Bazel uses to prepare per-action sandboxes. The idea was simple: if we could lower sandbox creation overhead, we could make Bazel’s sandboxing actually usable on macOS.
Unfortunately, things didn’t play out as I dreamed. Today, sandboxfs is effectively abandoned, and macOS sandboxing performance remains an unsolved problem. In this post, I’ll walk you through why I built sandboxfs, what worked, what didn’t, and why—despite its failure—I still think the core idea holds promise.
To understand how sandboxfs was intended to help with sandboxed build performance, we need to first dive into how Bazel runs build actions. For those unfamiliar with Bazel’s terminology, a build action or action is an individual build step, like a single compiler or linker execution.
To run actions, Bazel uses the strategies abstraction to decouple action tracking in the build graph from how those actions are actually executed. The default strategy for local builds is the sandboxed
strategy, which isolates the processes that an action runs from the rest of the system. The goal is to make these processes behave in a deterministic manner.
The sandboxed strategy achieves action isolation via two different mechanisms:
The use of kernel-level sandboxing features to restrict what the action can do (limit network access, limit reads and writes to parts of the file system, etc.). One such mechanism is sandbox-exec on macOS.
The creation of an execution root (or execroot) in which the action runs. The execroot contains the minimum set of files required for the action to run: namely, the toolchain and the action inputs (source files, toolchain dependencies, etc.). One way to do this is via symlink forests.
The default mechanism to create an execroot in Bazel is to leverage symlink forests: file hierarchies that use symlinks to refer to files that live elsewhere.
Creating a symlink forest is an operation that scales linearly with the number of files in it, and each symlink creation requires at least two system calls: one to create the symlink and another to delete it when the sandbox is torn down. Plus symlink forests typically have complex directory structures, so there are extra mkdir
and rmdir
operations to handle all intermediate path components. Doing thousands of these operations may only take milliseconds, but… overheads in action execution quickly compound and turn into visible build slowdowns.
To illustrate what this means in practice, consider this target:
cc_library(
name = "foo",
srcs = ["foo.c"],
)
This target makes Bazel spawn one action to compile foo.c
into foo.o
. Said action needs to: run the compiler; read the foo.c
file; and access any system includes that foo.c
may reference. Thus, the sandbox used to run this action may look like this:
.../sandbox/external/cc/bin/clang -> /usr/bin/clang
.../sandbox/external/cc/include/stdio.h -> /usr/include/stdio.h
.../sandbox/external/cc/include/stdlib.h -> /usr/include/stdlib.h
.../sandbox/libfoo/foo.c -> /home/jmmv/sample/libfoo/foo.c
Having this symlink forest in place, Bazel would run the equivalent of this command to perform the compilation:
cd .../sandbox && ./external/cc/bin/clang -nostdinc -I./external/cc/include -o libfoo/foo.o -c libfoo/foo.c
When Bazel runs this, it expects that clang
will only access files in the external/cc/include
directory it previously created inside the sandbox. But because reality may not match expectations, Bazel wraps the command by whatever technology the host OS provides to enforce sandboxing.
Creating symlink forests on an action basis was very expensive on macOS… or so everyone said. When I arrived to the Blaze team, sandboxing had already been disabled by default on macOS builds and the rationale behind that was that “symlinks were too slow”.
There were some flaws with this claim:
It was impossible to prove. I ran many microbenchmarks to exercise symlink creations and deletions in large amounts and could never observe a significant performance degradation compared to Linux.
Building Bazel with itself, with sandboxing enabled, did not show any sort of substantial performance loss. Yet Bazel has relatively large C++ and Java actions in its own build so you would have expected to see something.
If macOS was truly bad at something as fundamental as “symlink management”, you’d imagine that someone else would have found the issue and asked about it online (as it often happens with misguided NTFS complaints). But there were none to be found.
Still, I devised the sandboxfs plan right after developing sourcachefs—another short-lived stint in file systems development—and I charged ahead. I wanted sandboxfs to exist because it did solve an obvious scalability issue (issuing tens of thousands of syscalls per symlink forest creation is not free) and because I wanted sanboxfs to exist for pkg_comp’s own benefit.
sandboxfs replaces symlink forests with a virtual file hierarchy that can be materialized in constant time. Here is the flow of operations:
Bazel generates an in-memory manifest of the execroot structure and which files are backed by which other files.
Bazel sends this manifest to sandboxfs via an RPC (which means we have at least one system call to send a message through a socket and a couple of context switches).
sandboxfs updates its in-memory representation of the file system and exposes a new sandbox at its mount point.
Bazel runs the action in the new sandbox.
sandboxfs catches all I/O in the sandbox and redirects it to the relevant real backing files.
It’s this last point that presents the trade off behind sandboxfs, because sandboxfs doesn’t make all costs magically go away. Instead of paying the cost of setting up the sandbox upfront via many system calls, we pay a different cost over all reads and writes that go through the virtual file system. The original hypothesis was that this would be worth it, because most (but not all) build actions are not I/O bound, and most build actions do not access all the files that are mapped into their sandbox.
Going back to the example from before, Bazel would send an RPC like this to sandboxfs:
{
"CreateSandbox": {
Path: "/736",
Mappings: {
"external/cc/bin/clang": "/usr/bin/clang",
"external/cc/include/stdio.h": "/usr/include/stdio.h",
"external/cc/include/stdlib.h": "/usr/include/stdlib.h",
"libfoo/foo.c": "/home/jmmv/sample/libfoo/foo.c",
},
}
}
And this would cause the following file hierarchy to be immediately available under the mount point:
.../sandboxfs/736/external/cc/bin/clang
.../sandboxfs/736/external/cc/include/stdio.h
.../sandboxfs/736/external/cc/include/stdlib.h
.../sandboxfs/736/libfoo/foo.c
Note that I did not write what these files point to in this snippet because sandboxfs does not use symlinks. sandboxfs exposes the files as if they were real files, and it does that to prevent tools from resolving symlinks and discovering sibling files they aren’t supposed to see. From the point of view of clang
when it runs, everything it sees under .../sandboxfs/736/external/cc/include/stdio.h
is a copy of whatever is in /usr/include/stdio.h
.
Overall, sandboxfs was a fun exercise and a great journey to learn more about Rust, FUSE and file systems, and macOS internals:
I got to learn Rust. I was lucky to find a random coworker at Google that offered to review my code, and his input was an invaluable learning resource for me.
I got to learn about FUSE in quite a bit of detail. I had already played with it before, but by working on sandboxfs, I had to debug some gnarly problems.
I got to experience rewriting pre-existing Go code in Rust (because the original sandboxfs implementation was in Go). This was an enlightening exercise because, as I tried to convert the code “verbatim”, I discovered many subtle concurrency bugs and data races that Rust just didn’t let me write.
The initial performance evaluation of using sandboxfs for real iOS builds showed promise: I observed that a specific iOS app “only” got a 55% performance penalty when using sandboxfs instead of the 270% penalty it got from symlink forests. A good win, but insufficient to justify enabling sandboxing by default.
Many things really. Let’s start with wrong assumptions:
Symlink forest creation may not have been the biggest problem in sandboxing performance. As I mentioned in the opening, microbenchmarking this area of macOS didn’t show obvious slowdowns and building Bazel with itself didn’t show major performance differences with and without sandboxing. But iOS builds suffered massively from sandboxing, and the problem was elsewhere: the Objective C and Swift compilers cache persistent state on disk, and sandboxing was preventing such state from actually being persisted.
The need for sandboxing on interactive builds was questionable. Yes, it’d have been neat to have it, but in practice, the benefits are little: if your CI builds are powered by remote execution, which tends to happen when you use Bazel, then the implicit sandboxing of remote execution gives you almost all protections that you’d get from using sandboxing.
There were also implementation problems:
The original implementation of sandboxfs was written in Go, and I hit performance issues with the way bazil/fuse dealt with FUSE operations.
The previous was fixed by rewriting sandboxfs in Rust, but then I hit performance problems with the JSON-based RPC interface that sandboxfs had grown in a rush. Fixing this properly required a deep redesign to use path compression and to bypass JSON altogether. But I didn’t get to this because…
Kernel bugs / limitations in OSXFUSE erased the possibility of implementing a critical performance optimization.
And then I also hit unexpected changes in the ecosystem:
Apple deprecated kernel extensions, making the use of FUSE really convoluted and its future uncertain. Apple provided alternate APIs to implement file systems in user space, but those were designed for iCloud-style services and were/are not suitable for sandboxfs.
At around the same time in 2019, OSXFUSE went closed source. This meant that relying on it for any future work was not well-advised. There were still code dumps for older versions, but that was not something I was able to maintain.
Because of the previous two, I would have had to expose the sandboxfs virtual file system over NFSv4 instead of FUSE. Buildbarn’s bb-clientd provides a dual FUSE/NFSv4 implementation, which proves that this is technically doable, but adding an NFSv4 frontend to sandboxfs meant having to rewrite it from scratch. Plus I’m not sure we’d have gotten good-enough performance if we went this route.
At that point in mid-2019, given the other problems illustrated above… I had no interest nor time to rewrite sandboxfs “correctly” (remember, this was a 20% project at first, which unsurprisingly turned into an 120% project). It’d have been nice to do though, because “now I know how to do it right”.
I still believe that Bazel needs something like sandboxfs for efficient sandboxed builds. As I mentioned earlier, creating symlink forests does not scale for action execution, and with ever-growing toolchain sizes, the problem is getting worse over time. However, the benefits of local sandboxing are unclear if you are already using remote execution.
That said, people keep complaining about poor Bazel sandboxing performance on macOS, which means there still is a clear user need to make this better. And I’m not convinced the various “workarounds” that have been tried in this area (like reusing sandboxes) are sound designs nor that they can actually deliver on their promise.
In my case… I don’t run Bazel on Mac anymore at work. What’s more: I do not even use a Mac for personal reasons these days, which means my ulterior motive to use sandboxfs in pkg_comp
is gone. But! If you wanted to recreate sandboxfs from scratch, let’s talk!