Fil-C：一种内存安全的C语言实现

Fil-C：一种内存安全的C语言实现
Fil-C: A memory-safe C implementation

原始链接: https://lwn.net/SubscriberLink/1042938/658ade3768dd4758/

## Fil-C：为C和C++提供内存安全 Fil-C是一个新颖的编译器，源自Clang，旨在为现有的C和C++代码带来内存安全，*无需*修改源代码。主要由Filip Pizlo开发，它通过独特的“InvisiCaps”指针系统（实际上是软件实现的CHERI）实现这一点，从而在运行时添加检查，以防止诸如使用已释放内存之类的内存错误。最初速度较慢，Fil-C已经得到了显著优化，现在引入的性能开销仅比Clang慢几倍，有时在实践中可以忽略不计（例如通过运行Bash所证明的）。它利用并发垃圾收集器和内存安全的信号处理，依靠“安全点”进行同步。虽然Fil-C需要重新编译所有代码才能正确链接，但它已成功编译了基于Linux From Scratch的内存安全的Linux用户空间。它解决了关键的安全需求，通过解决内存不安全行为，而内存不安全行为是C程序中漏洞的常见来源。尽管Fil-C是一个年轻的项目，但它为将安全性改造到现有应用程序中提供了一个有希望的解决方案，在这些应用程序中，漏洞预防比性能问题更重要。

## Fil-C：一种内存安全的C语言实现 - 摘要 Fil-C是一种新的C语言实现，旨在实现内存安全，目标是在无需重写为Rust等语言的情况下保护现有的C代码库。由Pizlonator开发，它利用了一种新颖的方法，即“InvisiCaps”，来跟踪指针来源并防止常见的内存错误。提供了一个Nix包管理器集成“filnix”，允许用户使用Fil-C构建Nix包，并访问二进制缓存以加快构建速度。初步测试表明，tmux、nethack和coreutils等项目可以成功编译。虽然存在性能开销（通常慢约4倍），但该项目针对的是C程序的*用户*，而非*作者*，优先为处理不受信任输入的应用提供安全性。潜在的应用包括加固用户空间应用程序、flatpaks，甚至Apple的引导加载程序（后者已经使用了类似的方法）。该项目目前专注于x86_64架构进行测试，但从根本上来说并不局限于此。它构建在LLVM之上，为更广泛的可移植性提供了途径。

原文

By Daroc Alden
October 28, 2025

Fil-C is a memory-safe implementation of C and C++ that aims to let C code — complete with pointer arithmetic, unions, and other features that are often cited as a problem for memory-safe languages — run safely, unmodified. Its dedication to being "fanatically compatible" makes it an attractive choice for retrofitting memory-safety into existing applications. Despite the project's relative youth and single active contributor, Fil-C is capable of compiling an entire memory-safe Linux user space (based on Linux From Scratch), albeit with some modifications to the more complex programs. It also features memory-safe signal handling and a concurrent garbage collector.

Fil-C is a fork of Clang; it's available under an Apache v2.0 license with LLVM exceptions for the runtime. Changes from the upstream compiler are occasionally merged in, with Fil-C currently being based on version 20.1.8 from July 2025. The project is a personal passion of Filip Pizlo, who has previously worked on the runtimes of a number of managed languages, including Java and JavaScript. When he first began the project, he was not sure that it was even possible. The initial implementation was prohibitively slow to run, since it needed to insert a lot of different safety checks. This has given Fil-C reputation for slowness. Since the initial implementation proved viable, however, Pizlo has managed to optimize a number of common cases, making Fil-C-generated code only a few times slower than Clang-generated code, although the exact slowdown depends heavily on the structure of the benchmarked program.

Reliable benchmarking is notoriously finicky, but in order to get some rough feel for whether that level of performance impact would be problematic, I compiled Bash version 5.2.32 with Fil-C and tried using it as my shell. Bash is nearly a best case for Fil-C, because it spends more time running external programs than running its own code, but I still expected the performance difference to be noticeable. It wasn't. So, at least for some programs, the performance overhead of Fil-C does not seem to be a problem in practice.

Like what you are reading? Try LWN for free for 1 month, no credit card required.

In order to support its various run-time safety checks, Fil-C does use a different internal ABI than Clang does. As a result, objects compiled with Fil-C won't link correctly against objects generated by other compilers. Since Fil-C is a full implementation of C and C++ at the source-code level, however, in practice this just requires everything to be recompiled with Fil-C. Inter-language linking, such as with Rust, is not currently supported by the project.

Capabilities

The major challenge of rendering C memory-safe is, of course, pointer handling. This is especially complicated by the fact that, as the long road to CHERI-compatibility has shown, many programs expect a pointer to be 32 or 64 bits, depending on the architecture. Fil-C has tried several different ways to represent pointers since the project's beginning in 2023. Fil-C's first pointers were 256 bits, not thread-safe, and didn't protect against use-after-free bugs. The current implementation, called "InvisiCaps", allows for pointers that appear to match the natural pointer size of the architecture (although this requires storing some auxiliary information elsewhere), with full support for concurrency and catching use-after-free bugs, at the expense of some run-time overhead.

Fil-C's documentation compares InvisiCaps to a software implementation of CHERI: pointers are separated into a trusted "capability" piece and an untrusted "address" piece. Since Fil-C controls how the program is compiled, it can ensure that the program doesn't have direct access to the capabilities of any pointers, and therefore the runtime can rely on them being uncorrupted. The tricky part of the implementation comes from how these two pieces of information are stored in what looks to the program like 64 bits.

When Fil-C allocates an object on the heap, it adds two metadata words before the start of the allocated object: an upper bound, used to check accesses to the object based on its size, and an "aux word" that is used to store additional pointer metadata. When the program first writes a pointer value into an object, the runtime allocates a new auxiliary allocation of the same size as the object being written into, and puts an actual hardware-level pointer (i.e., one without an attached capability) to the new allocation into the aux word of the object. This auxiliary allocation, which is invisible to the program being compiled, is used to store the associated capability information for the pointer being stored (and is also reused for any additional pointers stored into the object later). The address value is stored into the object as normal, so any C bit-twiddling techniques that require looking at the stored value of the pointer work as expected.

This approach does mean that structures that contain pointers end up using twice as much memory, and every load of a pointer involves a pointer indirection through the aux word. In practice, the documentation claims that the performance overhead of this approach for most programs makes them run about four times more slowly, although that number depends on how heavily the program makes use of pointers. Still, he has ideas for several optimizations that he hopes can bring the performance overhead down over time.

One wrinkle with this approach is atomic access to pointers — i.e. using _Atomic or volatile. Luckily, there is no problem that cannot be solved with more pointer indirection: when the program loads or stores a pointer value atomically, instead of having the auxiliary allocation contain the capability information directly, it points to a third 128-bit allocation that stores the capability and pointer value together. That allocation can be updated with 128-bit atomic instructions, if the platform supports them, or by creating new allocations and atomically swapping the pointers to them.

Since the aux word is used to store a pointer value, Fil-C can use pointer tagging to store some additional information there as well; that is used to indicate special types of objects that need to be handled differently, such as functions, threads, and mmap()-backed allocations. It's also used to mark freed objects, so that any access results in an error message and a crash.

Memory management

When an object is freed, its aux word marks it as a free object, which lets the auxiliary allocation be reclaimed immediately. The original object can't be freed immediately, however. Otherwise, a program could free an object, allocate a new object in the same location, and thereby cover up use-after-free bugs. Instead, Fil-C uses a garbage collector to free an object's backing memory only once all of the pointers to it go away. Unlike other garbage collectors for C — such as the Boehm-Demers-Weiser garbage collector — Fil-C can use the auxiliary capability information to track live objects precisely.

Fil-C's garbage collector is both parallel (collection happens faster the more cores are available) and concurrent (collection happens without pausing the program). Technically, the garbage collector does require threads to occasionally pause just long enough to tell it where pointers are located on the stack, but that only occurs at special "safe points" — otherwise, the program can load and manipulate pointers without notifying the garbage collector. Safe points are used as a synchronization barrier: the collector can't know that an object is really garbage until every thread has passed at least one safe point since it finished marking. This synchronization is done with atomic instructions, however, so in practice threads never need to pause for longer than a few instructions.

The exception is the implementation of fork(), which uses the safe points needed by the garbage collector to temporarily pause all of the threads in the program in order to prevent race conditions while forking. Fil-C inserts a safe point at every backward control-flow edge, i.e., whenever code could execute in a loop. In the common case, the inserted code just needs to load a flag register and confirm that the garbage collector has not requested anything be done. If the garbage collector does have a request for the thread, the thread runs a callback to perform the needed synchronization.

Fil-C uses the same safe-point mechanism to implement signal handling. Signal handlers are only run when the interrupted thread reaches a safe point. That, in turn, allows signal handlers to allocate and free memory without interfering with the garbage collector's operation; Fil-C's malloc() is signal-safe.

Memory-safe Linux

Linux From Scratch (LFS) is a tutorial on compiling one's own complete Linux user space. It walks through the steps of compiling and installing all of the core software needed for a typical Linux user space in a chroot() environment. Pizlo has successfully run through LFS with Fil-C to produce a memory-safe version, although a non-Fil-C compiler is still needed to build some fundamental components, such as Fil-C's own runtime, the GNU C library, and the kernel. (While Fil-C's runtime relies on a normal copy of the GNU C library to make system calls, the programs that Fil-C compiles use a Fil-C-compiled version of the library.)

The process is mostly identical to LFS up through the end of chapter 7, because everything prior to that point consists of using cross-build tools to obtain a working compiler in the chroot() environment. The one difference is that the cross-build tools are built with a different configured prefix, so that they won't conflict with Fil-C. At that point, one can build a copy of Fil-C and use it to mostly replace the existing compiler. The remaining steps of LFS are unchanged.

Scripts to automate the process are included in the Fil-C Git repository, including some steps from Beyond Linux From Scratch that result in a working graphical user interface and a handful of more complicated applications such as Emacs.

Overall, Fil-C offers a remarkably complete solution for making existing C programs memory-safe. While it does nothing for undefined behavior that is not related to memory safety, the most pernicious and difficult-to-prevent security vulnerabilities in C programs tend to rely on exploiting memory-unsafe behavior. Readers who have already considered and rejected Fil-C for their use case due to its early performance problems may wish to take a second look — although anyone hoping for stability might want to wait for others to take the plunge, given the project's relative immaturity. That said, for existing applications where a sizeable performance hit is preferable to an exploitable vulnerability, Fil-C is an excellent choice.