具有动态扇出的缓存友好B+树节点
Cache-Friendly B+Tree Nodes with Dynamic Fanout

原始链接: https://jacobsherin.com/posts/2025-08-18-bplustree-struct-hack/

## B+树节点内存布局:性能权衡 为了在B+树中实现高性能,需要为每个节点提供连续的内存布局,以最大化CPU缓存的利用率。标准的C++ `std::vector` 会引入内存间接寻址,阻碍了这一目标。解决方案是采用一种“结构技巧”——利用柔性数组成员(C99和C++11标准化)来定义一个类,其最后一个成员是动态大小的数组。 这种技术允许为节点的元数据和数据分配单个连续的内存块,避免了指针追逐和缓存缺失。然而,它也带来了代价:需要手动内存管理,偏离了惯用的C++实践。必须小心地编排释放内存,并且向派生类添加成员可能导致数据损坏。 此外,该实现有效地“重新发明”了`std::vector` 的一部分,并引入了隐藏的约束——特别是,要求在节点内使用的所有数据类型都必须是可平凡复制的。尽管存在这些缺点,但优化内存访问所带来的性能提升使得这种权衡对于要求苛刻的应用来说是必要的。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 缓存友好的B+树节点,具有动态扇出 (jacobsherin.com) 10 分,由 jasim 52 分钟前发布 | 隐藏 | 过去 | 收藏 | 讨论 考虑申请YC冬季2026批次!申请截止日期为11月10日 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

For a high-performance B+Tree, the memory layout of each node must be a single contiguous block. This improves locality of reference, increasing the likelihood that the node's contents reside in the CPU cache.

In C++, achieving this means forgoing the use of std::vector, as it introduces a layer of indirection through a separate memory allocation. The solution to this problem though inevitably increases the implementation complexity and is mired with hidden drawbacks. Nevertheless, this is still a necessary trade-off for unlocking high performance.

  +----------------------+
| Node Metadata Header |
+----------------------+
| node_type_ | | max_size_ | | node_latch_ | | iter_end_ |--------------------+
+----------------------+ |
| Node Data | |
+----------------------+ |
| entries_[0] | | entries_[1] | | |
| entries_[2] | + used space |
| ... | | |
| entries_[k] | +----------------------+ | | entries_[k+1], which is one-past-the-last
| (unused space) | entry in the node.
| ... |
+----------------------+
Fig 1. Memory Layout of a B+Tree Node as a single contiguous block in heap

Challenges

flexible array member.

The C++11 standard formally incorporates the flexible array member, referring to it as an array of unknown bound when it is the last member of a struct.

Arrays of unknown bound

If expr is omitted in the declaration of an array, the type declared is "array of unknown bound of T", which is a kind of incomplete type, ...

extern int x[];      
int a[] = {1, 2, 3};

This means that in C++, the size can be omitted from the final array declaration (e.g. entries_[]), and the code will compile, enabling the same pattern.

B+Tree Node Declaration

placement new syntax which only constructs an object in a preallocated memory buffer provided by us. We know exactly how much space to allocate, which is information the standard new operator does not have in this scenario because of the flexible array member.


static BPlusTreeNode *Get(int p_fanout) {
size_t buf_size = sizeof(BPlusTreeNode) + p_fanout * sizeof(KeyValuePair);


void *buf = ::operator new(buf_size);


auto node = new(buf) BPlusTreeNode(p_fanout);

return node;
}

The result is a cache-friendly B+Tree node with a fanout that can be configured at runtime.

The Price Of Fine-Grained Control