展示HN:Threadprocs – 共享一个地址空间的执行文件(零拷贝指针)
Show HN: Threadprocs – executables sharing one address space (0-copy pointers)

原始链接: https://github.com/jer-irl/threadprocs

该项目探索了“threadprocs”——一种在Linux (aarch64/x86_64) 上融合进程和线程模型的新方法。Threadprocs 行为类似于拥有自己运行时(libc 等)的独立程序,但它们在共享地址空间内运行,从而能够通过直接指针操作实现零拷贝数据共享。 该系统使用一个服务器来托管共享空间,并使用一个启动器来启动其中的程序。应用程序通过“带外”方式(例如,通过复制/粘贴或套接字)共享指针,然后直接访问这些地址上的数据。一个服务器全局的暂存空间,通过扩展的 auxv 条目访问,可以促进服务发现和 IPC。 主要限制包括无法可靠地使用 `brk()`/`sbrk()`,调试工具如 `ptrace()`,以及需要位置无关代码。至关重要的是,内存必须在*同一个* threadproc 内分配和释放,防止跨 threadproc 的释放。尽管具有挑战性,该项目展示了一种替代传统 pthreads 的潜在方案,提供了一种独特的共享内存和进程间通信方法。一个框架 `tproc-actors` 建立在这个概念之上,并具有自定义内存管理方案。

## Threadprocs:独立程序共享地址空间 一个名为“threadprocs”的新项目,探索了一种新颖的编程模型,它在单个共享虚拟地址空间内启动独立的执行文件。其目标是结合线程(快速、零拷贝指针共享)和进程(隔离、独立二进制文件)的优点,同时避免两者的缺点。 与传统的线程或插件系统不同,threadprocs 运行带有 `main()` 函数的完整可执行文件,从而实现类似进程的组合。至关重要的是,由于共享地址空间,指针在这些“threadprocs”之间仍然有效,从而可以直接访问数据结构,如 `std::string`,而无需序列化。 该实现巧妙地操纵地址空间布局,并利用用户空间的 `exec()` 类比。然而,它有一个重要的权衡:缺乏内存保护。一个 threadproc *可以* 破坏另一个的内存,需要仔细的代码管理或内存安全的语言。作者承认这不一定是一个实用的解决方案,而是一个推动进程隔离和进程间通信边界的实验。讨论强调了与较旧操作系统之间的相似之处,以及在性能关键场景中的潜在应用,例如绕过共享内存系统中序列化开销。
相关文章

原文

This repository contains experimental code for thread-like processes, or multiple programs running in a shared address space. Each threadproc behaves like a process with its own executable, globals, libc instance, etc, but pointers are valid across threadprocs. This blends the Posix process model with the Posix multi-threading programming model, and enables things like zero-copy access to pointer-based data structures.

All Markdown files were written by hand.

See tproc-actors for one possible application framework building on top of threadprocs.

The code for the demoed programs is at example/sharedstr/allocstr.cpp and example/sharedstr/printstr.cpp, and neither contains any magic (/proc/[pid]/mem, etc), nor awareness of the server and launcher.

  • allocstr reads input, and copies it into a new std::string, and prints &newstring to console.
  • printstr reads a pointer as hex text, and prints whatever std::string it finds there.
demo.mp4

server memory diagram

The server utility "hosts" a virtual address space, and by using launcher to start programs, those launched programs coexist in the hosted address space.

Applications can share pointers in the virtual address space through some out-of-band mechanism (Demo uses copy/paste, dummy_server/client uses sockets, libtproc provides server-global scratch space), and then directly dereference those pointers, as they're valid in the shared address space.

libtproc provides basic detection of execution as a threadproc, and allows hosted threadprocs to access a "server-global" scratch space. Applications can build tooling using this space to implement service discovery and bootstrap shared memory-backed IPC.

This is implemented by adding another entry to the threadproc auxv.

tproc-actors uses this space to advertise per-threadproc actor registries.

Use Linux on aarch64 or x86_64; other architectures are not supported. This was developed in a VM running Debian on a Macbook Air M1, and also tested in a Debian x86_64 Github Codespace using the .devcontainer/ configuration.

Dependencies:

apt install build-essential liburing-dev
# May need to install gcc 14+
git submodule update --init

Notably there is no dependency no ELF libraries aside from Linux system headers, though those would probably make the code nicer.

Building:

Run auto integration tests:

Or run your own programs in a shared address space:

./buildout/server /tmp/mytest.sock &
./buildout/launcher /tmp/mytest.sock program1 arg1 arg2 &
./buildout/launcher /tmp/mytest.sock program2 arg3 arg4

Read the overview or implementation for information on the project, or read comparisons to existing work. I've also collected some lessons learned in conclusions.

  • Each threadproc has its own runtime library instance (libc), and care must be taken not to call malloc() in one threadproc but try to free() that memory in another threadproc.
  • Target applications must be compiled as "position independent code," as do any dynamically loaded objects.
    • This is standard for dynamically linked libraries, and default for executable binaries compiled in many modern distros in order to support flavors of ASLR.
    • Properly architected libraries can mitigate most drawbacks of this, and executable files also carry minimal overhead.
  • brk() (and sbrk()) cannot be used reliably, because they are "address space global" to the kernel, and processes typically assume they won't be called from unexpected places.
    • The server sets the MALLOC_MMAP_THRESHOLD_=0 environment variable for children to avoid the default glibc behavior and avoid these calls.
  • mmap with MAP_FIXED can't be used without first "reserving" a non-fixed mapping.
    • This is generally true of any program, and "unreserved" MAP_FIXED use is unsafe even in standard Linux programs.
    • See the manpage section Using MAP_FIXED safely
  • Debugging and ptrace() are not supported.
    • It may be possible to add partial support, but I suspect GDB makes some assumptions that would be difficult to satisfy
  • The threadproc's PID is not the same as the launching process, so operations in terms of PID may lead to issues if applications rely on details of PID-targeted operations.
  • Signals are forwarded from the launcher to the threadproc, but unhandle-able signals (SIGKILL) are not.
    • There are likely other edge cases if a threadproc relies on details of the Posix signal behavior.

There are other less pertinent limitations around the edges. For example, threadprocs have /proc/[pid]/comm values which reflect their launched binary, but cmdline isn't settable. exec() syscalls also "escape" the threadproc scheme, which is probably desired may cause subtle issues.

My initial vision was for threadprocs to pass std::unique_ptrs to each other, and support IPC with nested data. ABI aside, the major hiccup is that even if threadproc 1 releases a pointer, and threadproc 2 wraps it in a std::unique_ptr, when the destructor is called and it comes time to de-allocate the memory, threadproc 2 won't be able to do so.

Having independent libc, libstdc++, and rust libstd instances for each tproc greatly reduces the technical dependencies on launched programs, but it also means that a threadproc cannot deallocate memory allocated by another threadproc.

One could architect their application around this limitation, and ensure memory is always handed back to the tproc which allocated it so it can be de-allocated correctly. I've sketched out an application framework that automatically passes objects back to their allocating threadproc in a custom unique_ptr analogue, see tproc-actors.

This is an interesting direction, and raises questions that could be explored further, but I don't think this is a practical model for any serious software. Pthreads are a somewhat stagnant abstraction, but they have the benefit of decades of tooling and language development shaped around them. I haven't extracted the brainworm yet, though, and perhaps in the future I'll explore ways to augment shared memory regions with custom allocators, fixed mappings, etc.

联系我们 contact @ memedata.com