角案例 RCU 实现

角案例 RCU 实现
Corner-Case RCU Implementations

原始链接: https://people.kernel.org/paulmck/stupid-rcu-tricks-corner-case-rcu-implementations

## 非常规 RCU 实现读-复制-更新 (RCU) 是一种同步机制，开发者常常在不知情的情况下使用它。这突显了 RCU 可以以非常规，有时甚至是危险的方式实现。一种方法是**定时等待 RCU**，其中固定的时间段充当宽限期。虽然在硬实时系统（如较早UNIX操作系统中提出的15秒等待）中可能可行，但在非实时环境中非常危险。甚至在 DEC OSF/1 中，也使用了类似的 10 分钟等待时间来进行 vnode 回收，优先考虑性能而非保证安全性。另一种令人惊讶的实现是**固定缓冲区 RCU**，利用有限的内存隔离空间——例如 Linux 内核地址消毒器 (KASAN) 使用的空间——来隐式定义宽限期。其假设是读者在内存重新分配之前完成。这些方法表明 RCU 宽限期可以由*时间*或*空间*决定。然而，在严格控制的环境之外依赖这些方法，如果读者持续时间或更新速率没有仔细限制，则会面临内存损坏的风险。历史上，系统甚至以可靠性换取性能，接受崩溃以获得速度提升——这对于非常规 RCU 实现来说是一个警示故事。彻底的测试仍然至关重要，可以避免隐藏的错误，正如过去一个 RCU 缺陷直到通过有针对性的测试才被发现所说明的那样。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Corner-Case RCU 实现 (kernel.org) 5 分，luu 1 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论帮助 brudgers 26 分钟前 [–] RCU == 准备好复制更新回复考虑申请 YC 的 2026 年夏季批次！申请截止至 5 月 4 日指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

RCU has a way of popping up unexpectedly. In the words of Fedor Pikus, “In fact, you may already be using the RCU approach in your program without realizing it!” This post will describe uses of a couple of highly unorthodox (some might say “completely irresponsible”) types of RCU implementations, both accidental and otherwise.

Timed-Wait RCU

One very simple class of RCU implementations uses a fixed period of time for the grace period. This can clearly work well in hard real-time kernels and applications, though it has also been used in a prototype non-real-time kernels. In an early 1990s verbal discussion, none other than Van Jacobson pointed out that a 15-second delay would suffice in the research version of a proprietary-UNIX OS that he was working with. I responded (also verbally) that in DYNIX/ptx interrupt handlers sometimes executed for more than a minute (as in more than 15 seconds), but that Jack Slingwine and I had a way to get the same low-overhead/high-scalability effect without the need for hard real-time constraints on readers. Van expressed interest, so I sent him an early draft of the first RCU conference paper. Some years later, I had the privilege of hearing Van say nice things about Linux-kernel RCU.

For the benefit of any long-time RCU users who (like me) have a “make it make sense” filter in their heads, here is a Linux-kernel implementation of a synchronize_rcu() for Van's RCU:

void synchronize_rcu(void) { schedule_timeout_uninterruptible(15 * HZ); }

In other words, synchronize_rcu() does a fixed wait of 15 seconds, after which it is assumed, without proof, that all pre-existing readers have completed.

In the mid-1990s, Aju John wrote the USENIX paper Dynamic vnodes — Design and Implementation, which proposed a fixed 10-minute wait time for reclaiming vnodes in a proprietary UNIX system, DEC OSF/1 Version 3.0.

This approach might actually make sense in a hard-real-time environment, but would of course be extremely dangerous even there. On the other hand, you have to admit that Van and Aju were taking a no-holds-bared approach to performance and scalability, and doing so very early in the game! And Paul Khuong reports that this technique was recently used in production for an extended period, until a more principled and sufficiently performant technique was put in its place.

As far as I know, none of these timed-wait RCU use cases remains in production use, but the fact remains that timed-wait RCU really has been used in production. And maybe it is still being used, dangerous though use of timed-wait RCU might be outside of hard-real-time environments! On the other hand, this is a rare instance of hard real-time making something way simpler, at least as long as non-real-time threads are prohibited from using RCU read-side critical sections.

Fixed-Buffer RCU

The previous section covered RCU grace-period detection based purely on time. But a recent discussion with Hui Xie led me to wonder about instead basing them purely on space, as in address space. Is this possible?

The Linux-kernel address sanitizer (KASAN) keeps newly freed blocks of memory in a fixed-size quarantine. This is done to increase the probability that a use-after-free access occurs while corresponding block of memory has not yet been reallocated, thus reducing the number of false negatives that could otherwise occur if freed memory was instead immediately reallocated.

However, this can be thought of another corner-case RCU implementation, where the “readers” are assumed to complete before the corresponding block of memory is reallocated.

And rcutorture does something similar by deferring freeing of RCU_TORTURE_PIPE_LEN (AKA 10) of the blocks previously referenced by rcu_torture_current. In addition, there are 10 times that many elements total (AKA 100 of them). As with KASAN, the idea is to reduce the incidence of false negatives, so that an extremely long RCU reader combined with a series of extremely short RCU grace periods would still have roughly a 99% chance of each reader issuing a diagnostic for the too-short grace periods.

Although in these two examples, the intent of the fixed buffers is bug detection, they could instead be used to create an extremely limited and very sharp-edged RCU grace-period detection mechanism. So the answer to the question raised by the discussion with Hui Xie is “yes”!

Time vs. Space and Implications

In short, it is possible (though risky!) to implement RCU grace-period detection purely in terms of time on the one hand and purely in terms of space on the other. This provides a nice counterpoint to RCU using both time and space for its underlying synchronization, with rcu_read_lock(), rcu_read_unlock(), synchronize_rcu(), and call_rcu() handing time and rcu_assign_pointer() and rcu_dereference() handling space.

And who knows? Perhaps someone is using fixed-buffer RCU in production. But if you are that someone, please understand that you need to sharply and deterministically bound both reader duration and update rate. Either that, or you (and your users!) need to tolerate some probability of memory corruption.

Some historical perspective is provided by the rumored 1980s DEC VAX system running BSD UNIX whose systems administrator chose to use known-buggy patches that were said to provide about a 15% performance boost. And about one extra crash per week. At the time, some felt that this was the acme of systems-administration excellence, while others felt that this was completely foolhardy.

A less intentional example is provided by none other than Linux-kernel RCU shortly after suspend and hibernation were accepted into the kernel. These features forced CPU hotplug, which silently negated my attempts to test RCU in non-CPU-hotplug kernels. This meant that rcutorture failed to spot a stupid bug that I injected into RCU that, in non-CPU-hotplug kernels, caused RCU grace periods to be a fixed few-millisecond delay.

And nobody noticed.

Aside from a developer who tested hundreds of concurrent kernel builds on a two-CPU system, who reported the bug, but disappeared when I asked for his .config. And a few months after that, Nick Piggin, whose dentry cache testing tickled this bug. Nick found and fixed my stupid bug. I then changed rcutorture to pay more attention to the .config file for the kernel under test, and to apply stress in a more targeted manner. And given today's much more aggressive Linux-kernel testing, I suspect that a similar bug today would be noticed much more quickly.

So, as always, choose wisely! ;–)

History

Add example synchronize_rcu() timed-wait implementation to reinforce the unorthodox nature of these RCU implementation approaches.
Add 1980s rumors of DEC VAX system choosing performance over reliabilty.