垃圾回收很有用。

垃圾回收很有用。
Garbage collection is useful

原始链接: https://dubroy.com/blog/garbage-collection-is-useful/

## 增量解析与垃圾回收的见解作者在使用Ohm（一种增量解析器）和ProseMirror构建双向文本编辑器时遇到了性能问题。目标是在底层文本更改时，利用Ohm的增量解析能力来高效地更新ProseMirror文档。最初的解决方案是“追踪”所有节点，以识别编辑期间被删除的节点——本质上是一种垃圾回收方法。然而，这违背了增量性的目的，即使对于小的更改也需要对整个文档进行扫描。回忆起论文“垃圾回收的统一理论”中的一个关键见解，作者切换到“引用计数”方法。他们不再追踪*存活*节点，而是通过在文档更新时递减引用计数来追踪*死亡*节点。这使得他们能够快速识别被删除的节点，而无需遍历整个文档，从而显著提高了增量更新的性能。核心思想是关注识别哪些内容被*删除*，而不是哪些内容*保留*，这反映了垃圾回收中追踪和引用计数之间的对偶性。

## Hacker News 讨论总结：垃圾回收与引用计数一个 Hacker News 帖子引发了关于垃圾回收（GC）与引用计数的讨论。核心争论在于引用计数是否是“较差”形式的 GC。引用计数通过在引用消失时递减计数来有效地管理对象生命周期，但它难以处理循环——对象相互引用，导致单个计数无法达到零的情况。参与者强调了权衡：引用计数提供确定性性能，并且可以作为 C++ 和 Rust 等语言的库实现，提供受控的清理。GC 虽然通常更优越，但可能会引入暂停和复杂性。对话还涉及软件设计中对称性的重要性以及抽象如何“泄露”底层复杂性，从而影响性能。一些评论者分享了构建自定义 GC 系统的经验，并指出管理内存效率的挑战，尤其是在使用持久化存储时。最终，最佳方法取决于特定应用程序及其性能要求，GC 对于并发数据结构等复杂场景至关重要。

原文

November 14, 2025

A long time ago, I spent a few years working on garbage collection in the J9 Java VM.^{And even though I’ve since done mostly done higher-level stuff, having a deeper knowledge of GC has continued to come in useful.}

Yesterday, it was an insight from one of my favourite GC papers, A Unified Theory of Garbage Collection, which helped me solve a tricky problem.

ohm’s incremental parsing

I’m working with a team that’s using Ohm to parse text documents and render a rich text version in ProseMirror. The goal is bidirectional updates: changes in ProseMirror should propagate to the text version, and vice versa.

Ohm supports incremental parsing, which means that if you parse some text and then make a small edit, it can quickly reparse by reusing portions of the previous result.

It also supports limited form of incremental transforms. You can define an attribute, which is kind of like a memoized visitor, and the attribute value for a given node will only need to be recalculated if the edit affected one of its subtrees. So you can easily implement a form of persistent data structure, where each new value (e.g., an AST) shares a bunch of structure with the previous one.

the problem

Using this machinery, I tried making an pmNodes attribute that produced a ProseMirror document for a given input. When the text document is edited, it produces a new tree which shares a bunch of nodes with the previous one.

The `pmNodes` tree before and after an edit.

Then, my plan was to construct a ProseMirror transaction that would turn the old tree into the new one. To do that, it’s helpful to know which nodes appeared in the old document, but not the new one.

My first implementation of this was equivalent to tracing garbage collection — after each edit, I walked the entire document, and recorded all the nodes in a Set. The difference between the sets told me which nodes had died.

But this kind of defeats the purpose of incrementality — if you have a long document and make a small edit, we should be able to process without visiting every node in the document.

the solution

Then I remembered A Unified Theory of Garbage Collection, a 2004 OOPSLA paper by some former colleagues of mine:

Tracing and reference counting are uniformly viewed as being fundamentally different approaches to garbage collection that possess very distinct performance properties. We have implemented high-performance collectors of both types, and in the process observed that the more we optimized them, the more similarly they behaved — that they seem to share some deep structure.

We present a formulation of the two algorithms that shows that they are in fact duals of each other. Intuitively, the difference is that tracing operates on live objects, or “matter”, while reference counting operates on dead objects, or “anti-matter”. For every operation performed by the tracing collector, there is a precisely corresponding anti-operation performed by the reference counting collector.

This was the answer I needed! Rather than visiting all the live objects, I wanted to only visit the dead ones, and reference counting would let me do that.

So I added a way of maintaining a reference count for all the nodes in the doc. When we produce a new document, we decrement the reference count of the old root node (it will always be 0 afterwards). So we recursively decrement the ref count of its children, and so on. This gives me exactly what I wanted — a way to find all the nodes that were not reused, without having to visit most of the nodes in the doc.