Linux 系统下 C/POSIX 标准库实现的比较
Comparison of C/POSIX standard library implementations for Linux

原始链接: https://www.etalabs.net/compare_libcs.html

本文档比较了 Linux 系统下几种 C/POSIX 标准库的实现:musl、uClibc、dietlibc 和 glibc,重点关注功能与精简之间的平衡。Musl 通常在安全性、标准遵从性和轻量级方面表现出色,而 glibc 功能丰富且性能优化。Dietlibc 优先考虑最小体积,但功能不足,uClibc 则在功能和体积之间取得平衡,但更为复杂。 比较内容涵盖:体积(文件大小、内存开销)、资源耗尽时的行为、性能(内存分配、字符串操作、线程)、ABI 和版本控制、使用的算法、功能(C 标准一致性、线程、区域设置支持)、目标架构、构建环境、安全/加固措施以及许可证。它突出了体积、性能、标准遵从性和安全性之间的权衡。作者承认自己偏向 musl,因为自己是 musl 的作者。性能数据来自一台特定的英特尔 Atom 机器,仅供参考,以了解相对性能。

一个Hacker News帖子讨论了Linux上C/POSIX标准库实现的比较。Pizlonator注意到,在他的Fil-C项目(“Yololand”)中,从glibc切换到musl会导致1-2%的性能下降,他怀疑这是由于glibc的`memcpy`实现更好。他澄清了Fil-C独特的架构,包括一个“用户空间”和一个“yololand”,每个都有自己的libc。Skissane质疑在“yololand”中使用libc的`memcpy`的必要性,并建议使用替代方案。 一些评论者指出,比较表格已经过时,而且它的创建者是musl的作者。讨论了性能差异,musl有时会更慢,尤其是在浮点打印和内存分配方面(尽管后者在“mallocng”的改进下有所改善)。一些人认为musl的缓慢与它的默认分配器有关,它可以被替换为mimalloc等替代方案。glibc由于其复杂的实现以及对各种架构和国际化的广泛支持,被描述为“臃肿但快速”。其他人指出,只有在需要快速fork时glibc才慢,但在其使用SIMD指令的超级优化的字符串/内存函数方面表现出色。
相关文章
  • (评论) 2023-12-18
  • (评论) 2024-06-25
  • (评论) 2023-11-17
  • (评论) 2025-03-17
  • 使用 -fsanitize=undefined 和 Picolibc 的乐趣 2025-04-16

  • 原文
    Comparison of C/POSIX standard library implementations for Linux

    A project of Eta Labs.

    The table below and notes which follow are a comparison of some of the different standard library implementations available for Linux, with a particular focus on the balance between feature-richness and bloat. I have tried to be fair and objective, but as I am the author of musl, that may have influenced my choice of which aspects to compare.

    Future directions for this comparison include detailed performance benchmarking and inclusion of additional library implementations, especially Google's Bionic and other BSD libc ports.

    Bloat comparison musluClibcdietlibcglibc
    Complete .a set 426k 500k 120k 2.0M †
    Complete .so set 527k 560k 185k 7.9M †
    Smallest static C program 1.8k 5k 0.2k 662k
    Static hello (using printf) 13k 70k 6k 662k
    Dynamic overhead (min. dirty) 20k 40k 40k 48k
    Static overhead (min. dirty) 8k 12k 8k 28k
    Static stdio overhead (min. dirty) 8k 24k 16k 36k
    Configurable featureset no yes minimal minimal
    Behavior on resource exhaustion musluClibcdietlibcglibc
    Thread-local storage reports failure aborts n/a aborts
    SIGEV_THREAD timers no failure n/a n/a lost overruns
    pthread_cancel no failure aborts n/a aborts
    regcomp and regexec reports failure crashes reports failure crashes
    fnmatch no failure unknown no failure reports failure
    printf family no failure no failure no failure reports failure
    strtol family no failure no failure no failure no failure
    Performance comparison musluClibcdietlibcglibc
    Tiny allocation & free 0.005 0.004 0.013 0.002
    Big allocation & free 0.027 0.018 0.023 0.016
    Allocation contention, local 0.048 0.134 0.393 0.041
    Allocation contention, shared 0.050 0.132 0.394 0.062
    Zero-fill (memset) 0.023 0.048 0.055 0.012
    String length (strlen) 0.081 0.098 0.161 0.048
    Byte search (strchr) 0.142 0.243 0.198 0.028
    Substring (strstr) 0.057 1.273 1.030 0.088
    Thread creation/joining 0.248 0.126 45.761 0.142
    Mutex lock/unlock 0.042 0.055 0.785 0.046
    UTF-8 decode buffered 0.073 0.140 0.257 0.351
    UTF-8 decode byte-by-byte 0.153 0.395 0.236 0.563
    Stdio putc/getc 0.270 0.808 7.791 0.497
    Stdio putc/getc unlocked 0.200 0.282 0.269 0.144
    Regex compile 0.058 0.041 0.014 0.039
    Regex search (a{25}b) 0.188 0.188 0.967 0.137
    Self-exec (static linked) 234µs 245µs 272µs 457µs
    Self-exec (dynamic linked) 446µs 590µs 675µs 864µs
    ABI and versioning comparison musluClibcdietlibcglibc
    Stable ABI yes no unofficially yes
    LSB-compatible ABI incomplete no no yes
    Backwards compatibility yes no unofficially yes
    Forwards compatibility yes no unofficially no
    Atomic upgrades yes no no no
    Symbol versioning no no no yes
    Algorithms comparison musluClibcdietlibcglibc
    Substring search (strstr) twoway naive naive twoway
    Regular expressions dfa dfa backtracking dfa
    Sorting (qsort) smoothsort shellsort naive quicksort introsort
    Allocator (malloc) musl-native dlmalloc diet-native ptmalloc
    Features comparison musluClibcdietlibcglibc
    Conformant printf yes yes no yes
    Exact floating point printing yes no no yes
    C99 math library yes partial no yes
    C11 threads API yes no no no
    C11 thread-local storage yes yes no yes
    GCC libstdc++ compatibility yes yes no yes
    POSIX threads yes yes, on most archs broken yes
    POSIX process scheduling stub incorrect no incorrect
    POSIX thread priority scheduling yes yes no yes
    POSIX localedef no no no yes
    Wide character interfaces yes yes minimal yes
    Legacy 8-bit codepages no yes minimal slow, via gconv
    Legacy CJK encodings no no no slow, via gconv
    UTF-8 multibyte native; 100% conformant native; nonconformant dangerously nonconformant slow, via gconv; nonconformant
    Iconv character conversions most major encodings mainly UTFs no the kitchen sink
    Iconv transliteration extension no no no yes
    Openwall-style TCB shadow yes no no no
    Sun RPC, NIS no yes yes yes
    Zoneinfo (advanced timezones) yes no yes yes
    Gmon profiling no no yes yes
    Debugging features no no no yes
    Various Linux extensions yes yes partial yes
    Target architectures comparison musluClibcdietlibcglibc
    i386 yes yes yes yes
    x86_64 yes yes yes yes
    x86_64 x32 ABI (ILP32) experimental no no non-conforming
    ARM yes yes yes yes
    Aarch64 (64-bit ARM) yes no no yes
    MIPS yes yes yes yes
    SuperH yes yes no yes
    Microblaze yes partial no yes
    PowerPC (32- and 64-bit) yes yes yes yes
    Sparc no yes yes yes
    Alpha no yes yes yes
    S/390 (32-bit) no no yes yes
    S/390x (64-bit) yes no yes yes
    OpenRISC 1000 (or1k) yes no no not upstream
    Motorola 680x0 (m68k) yes yes no yes
    MMU-less microcontrollers yes, elf/fdpic yes, bflt no no
    Build environment comparison musluClibcdietlibcglibc
    Legacy-code-friendly headers partial yes no yes
    Lightweight headers yes no yes no
    Usable without native toolchain yes no yes no
    Respect for C namespace yes LFS64 problems no LFS64 problems
    Respect for POSIX namespace yes LFS64 problems no LFS64 problems
    Security/hardening comparison musluClibcdietlibcglibc
    Attention to corner cases yes yes no too much malloc
    Safe UTF-8 decoder yes yes no yes
    Avoids superlinear big-O's yes sometimes no yes
    Stack smashing protection yes yes no yes
    Heap corruption detection yes no no yes
    Misc. comparisons musluClibcdietlibcglibc
    License MIT LGPL 2.1 GPL 2 LGPL 2.1+ w/exceptions

    In general

    For each comparison in the table, each library is marked in red, yellow, or green. Red or yellow indicates that the library fails to support a feature or satisfy an optimality condition that may be desirable to some users.

    For comparisons involving testing and measurement, the particular library versions compared are:

    • musl 1.1.5
    • uClibc 0.9.33.2 (Buildroot 2015.02)
    • dietlibc 0.32
    • glibc 2.19

    Note that previous versions of this comparison included eglibc rather than glibc, mainly since Debian-based distributions were using the eglibc fork during the time in which glibc was essentially unmaintained. Since most of eglibc has been merged back into glibc and eglibc is being discontinued, the comparison has been updated based on glibc.

    Bloat comparison

    Roughly speaking, “bloat” is used to refer to overhead cost that does not contribute to the functioning of an application.

    All figures are approximate based on the tests of versions of these libraries available on systems I use. I've used size(1) instead of file size since static library files are roughly 80% ELF header overhead for the contained object files. Part of what makes the shared libraries larger than their static equivalents is that they include parts of libgcc for long division and other math functions.

    The size totals for glibc include the size of iconv modules, roughly 5M, in the “Complete .so set” figure. These are essential to providing certain functionality, and should be installed whether static or dynamic linking is being used.

    The smallest C program is:

    int main() {}

    And the "hello" program I used is:

    #include <stdio.h>
    int main(int argc, char **argv) { printf("hello %d\n", argc); }

    I've written it this way to ensure that the compiler cannot optimize the string printed to a constant and replace the call to printf with a call to puts.

    Overhead is measured in dirty pages, i.e. the amount of swap-backed physical memory each process requires. These are a mix of private copy-on-write maps of the program image on disk, the heap, the stack, and anonymous maps. The /proc/$pid/smaps file was used to obtain the numbers for a program spinning in an infinite loop.

    Dynamic linking overhead is largely dependent on the dynamic linker. A good 12-16k of the dynamic overhead is due to inefficiency in the standard dynamic linker. Ideally, replacing it could drop the overhead difference between static- and dynamic-linked programs to a single page.

    It should be noted that uClibc was tested with many optional features enabled, particularly locale. Due to a bug (design flaw) in uClibc's locale support, locale loading code and malloc get linked even in programs which never use setlocale.

    Behavior on resource exhaustion

    These comparions deal with the robstness of various interfaces when the amount of free memory or other system resources are extremely low. Reporting failure is shaded green when it is the theoretical optimal behavior; it is shaded yellow when an alternate implementation could successfully perform the operation with no resource usage.

    Thread-local storage covers both the case of attempting to create a new thread when there is insufficient memory available to satisfy the thread-local storage requirements of all loaded modules, and the case of attempting to load a new module with thread-local storage via dlopen when there is insufficient memory available to satisfy the storage requirements of all extant threads.

    In the case of pthread_cancel, NPTL dynamically loads libgcc_s.so.1 at runtime upon the first cancellation request, and aborts the program if loading fails for any reason, including but not limited to resource exhaustion.

    Performance comparison

    All of these figures were obtained using my libc-bench suite, in UTF-8 locales, on one particular Intel Atom N280-based machine. They are not intended to be rigorous, only to give a rough idea of relative order-of-magnitude performance.

    The tiny and big allocation figures are from b_malloc_tiny1 and b_malloc_big1. The allocation contention tests measure malloc performance when two threads are simultaneously performing allocation and free operations. In the first test (local), each thread frees its own allocations. In the second (shared), the allocating and freeing thread are often not the same, breaking thread-local arena/cache optimizations.

    The strstr figure is the max time taken by any of the strstr tests, in the interest of measuring worst-case time; which case is worst varies by implementation. glibc's bad performance could be fixed trivially by removing the code that disables the best optimization for needles shorter than 32 bytes; with this change it should match or slightly outperform musl.

    The thread create and join figure is from b_pthread_createjoin_serial1.

    ABI and versioning comparison

    Backwards compatibility means the usual thing, that new versions of the library are compatible with programs compiled against an older version. "Forwards compatibility" is a term I may have invented, but the idea it's intended to convey is that old versions of the library are compatible with programs compiled against a newer version, as long as the program does not depend on features that were missing from the older library version. In the latter case, the program would simply fail at (static or dynamic) link time with missing symbols.

    Perhaps the simplest way to think of "forwards compatibility" is that it means you're not required to upgrade the library unless a program actually needs functionality that's missing in your version.

    Symbol versioning and forwards compatibility both have merits, but they're essentially mutually exclusive.

    "Atomic upgrades" means that a single atomic filesystem operation upgrades the library, with no race condition window during which dynamic-linked programs might fail to run. The canonical way to ensure atomic upgrades is having the whole library in a single .so file.

    Algorithms comparison

    When comparing substring search algorithms, m typically refers to the length of the needle (substring) and n typically refers to the length of the haystack (string to be searched). The two-way algorithm is O(n), and with the Boyer-Moore-like improvements musl uses (and which glibc uses, but only for extremely long needles), typical runtime is proportional to n/m. The naive algorithm is O(nm).

    Backtracking regular expression implementations are simple to write, but have pathologically bad performance on many simile real-world expressions, and fail to take advantage of the regularity of the language.

    The naive quicksort dietlibc uses has O(n) space requirement on the stack, meaning it can and will lead to stack-overflow crashes in real-world usage. This can be fixed by choosing the optimal order of recursion and performing tail-call optimizations. Quicksort is also O(n²) in time, and while typical performance is much better, worst-case performance is very bad. Shell sort is typically O(nα) where 1<α<2, though it can be optimized to O(n(log n)²). Determining the characteristics of uClibc's version would require some analysis. Smooth sort is O(n log n) and interpolates smoothly down to O(n) proportional roughly to the degree to which the input is already sorted. Intro sort is a variant of quicksort which detects worst-case recursion and switches to heap sort to maintain O(n log n) bounds.

    Features comparison

    Exact floating point printing refers to the ability to print the exact value of floating point numbers with printf when the specified precision is high enough. For instance, as a double-precision value, 0.1 is 0.1000000000000000055511151231257827021181583404541015625, which is the diadic rational 115292150460684704/260. Perhaps more usefully, the (exactly representable) number 2-60 should print as 0.000000000000000000867361737988403547205962240695953369140625 rather than some inexact approximation.

    A complete C99 math library consists of the new single-precision and extended-precision versions of all the previously existing math functions, as well as their complex versions and tgmath.h.

    POSIX threads refers to threads with real POSIX semantics, not the historical broken LinuxThreads (where each thread behaves like a distinct process) or similar implementations.

    POSIX localedef refers to the ability to define custom locales, including charsets, etc.

    TCB passwords are a feature from Openwall which move the password hashes from /etc/shadow to /etc/tcb/username/shadow. This allows users to change passwords and allows programs running as the user (for example, screen lockers) to authenticate the user's password without special suid or sgid privileges.

    Linux extensions refer to kernel interfaces provided by Linux outside the scope of POSIX and historical behavior - epoll, signalfd, extended attributes, capabilities, module loading, and so on.

    Target architectures comparison

    There are a number of conformance issues in glibc's x32 support, the most notable being that it defines the tv_nsec member of struct timespec as long long despite both POSIX and C11 requiring it to have type long. This discrepency affects use with formatted printing functions and use of pointers to the member, among other things. A number of other interfaces also have been changed to use long long instead of long in structures; in many cases there is no standard governing the affected interface, but the changes break the interface contract published in other documentation such a Linux man pages.

    uClibc's microblaze port is marked partial because it lacks support for threads and possibly other core features.

    Ports marked "experimental" are those documented as such; this may mean some functionality is broken and/or ABI is not stable.

    Build environment comparison

    "Legacy-code-friendly headers" means that the system C header files evolved out of historical practice, and by default define/declare many things they shouldn't but which some legacy code might expect. They typically rely on deep levels of nested inclusion and complex conditional compilation.

    "Lightweight headers" are roughly the opposite, written from scratch to match the C and POSIX standards, with minimal nested inclusion and preprocessor conditionals. This leads to an enormous performance advantage compiling large numbers of small files, but it also means poorly-written programs that relied on certain implementation-specific legacy characteristics might need minor fixes to compile.

    Some of the libraries reviewed are virtually impossible to use without having built GNU binutils and gcc specifically targetting them (i.e. a native toolchain). Others make it easy to use an existing toolchain originally targetting a different library, overriding certain compiler and linker options to use the alternate library implementation.

    Respect for the C and POSIX namespaces means that the namespace used by the standard C and standard POSIX functions and headers conforms to what these standards say about which names are reserved for the implementation versus reserved for the application. One common area of non-conformance is remapping functions like open, lseek, etc. to open64, lseek64, etc. - names which are reserved for the application. This is flagged as "LFS64 problems" in the table.

    Security/hardening comparison

    "Attention to corner cases" means that the library follows a general philosophy of being careful to support all possible inputs that don't explicitly invoke undefined behavior, especially when the input may come from a source external to the program. Over-use of malloc is flagged in the comparison when some interfaces that should not have any failure cases have created artificial ones due to the possibility of memory exhaustion.

    An unsafe UTF-8 decoder is one which fails to detect invalid sequences and happens to decode them as aliases for valid characters.

    Heap corruption detection means malloc makes an effort to detect, report, and abort when it detects double-free, attempts to free a pointer not obtained via malloc, etc.

    Misc. comparisons

    The choice of license affects the usability of a standard library implementation. GPL v2-only is shaded as the "worst" choice, in that it is incompatible with a large volume of Open Source/Free Software, namely anything using GPL v3-only. LGPL v2.1-only is much less problematic; it does not allow creation of a new LGPL-licensed library by merging with LGPL v3-only code, but it allows the merged program to be released under version 3 or later of the GPL. LGPL v2.1-or-later is very flexible, and MIT or BSD even moreso.

    联系我们 contact @ memedata.com