使 Clang AST 更精简和快速

使 Clang AST 更精简和快速
Making the Clang AST Leaner and Faster

原始链接: https://cppalliance.org/mizvekov,/clang/2025/10/20/Making-Clang-AST-Leaner-Faster.html

## Clang AST 改进以加速 C++ 编译 Clang 抽象语法树 (AST) 的最新改进带来了可衡量的编译时间提升，尤其是在大量使用模板的 C++ 代码中。优化的核心在于减少 AST 中类型表示的大小和复杂性。此前，`ElaboratedType` 节点会增加开销，因为它将语法细节（如 `struct` 关键字和命名空间）与核心类型信息 (`RecordType`) 分开存储。更新消除了 `ElaboratedType`，将这些数据直接集成到 `RecordType` 中，形成更紧凑的结构。类似地，用于名称限定符的 `NestedNameSpecifier` 被重新设计为单个带标签的指针，而不是由多个分配组成的链表。这大大减少了内存使用量，并通过简化唯一化过程来加快类型比较速度。新设计还集成了常见查询的缓存，进一步提升性能。这些更改已随 Clang 22 发布，已在实际项目中显示出改进：`stdexec` 的构建时间减少了 7%，Chromium 的构建时间提高了 5%。这些优化表明，编译器内部有针对性的结构变化可以为大型 C++ 代码库带来显著的性能优势。

一篇最近的Hacker News帖子强调了Clang编译器的抽象语法树（AST）的改进，使其更精简、更快速。这对于现代C++代码库尤其重要，因为它们由于大量使用模板，通常会生成巨大的AST。讨论的重点是优化这个过程的努力，评论者指出，使用LLVM CAS库的细粒度缓存工作已经在Apple的Clang中发布。虽然这些改进受到赞扬，但有些人担心这可能会鼓励*更多*复杂的模板使用。其他人则争论C++泛型的根本问题，将其与Java等语言的方法进行对比。一个关键的挑战在于平衡优化与编译器内数据流的潜在变化，尽管性能有所提升，但这种变化可能存在争议。总的来说，这次更新被视为一项有价值的努力，旨在缓解大型C++项目中的编译时痛苦。

原文

Modern C++ codebases — from browsers to GPU frameworks — rely heavily on templates, and that often means massive abstract syntax trees. Even small inefficiencies in Clang’s AST representation can add up to noticeable compile-time overhead.

This post walks through a set of structural improvements I recently made to Clang’s AST that make type representation smaller, simpler, and faster to create — leading to measurable build-time gains in real-world projects.

A couple of months ago, I landed a large patch in Clang that brought substantial compile-time improvements for heavily templated C++ code.

For example, in stdexec — the reference implementation of the std::execution feature slated for C++26 — the slowest test (test_on2.cpp) saw a 7% reduction in build time.

Also the Chromium build showed a 5% improvement (source).

At a high level, the patch makes the Clang AST leaner: it reduces the memory footprint of type representations and lowers the cost of creating and uniquing them.

These improvements will ship with Clang 22, expected in the next few months.

How elaboration and qualified names used to work

Consider this simple snippet:

namespace NS {
  struct A {};
}
using T = struct NS::A;

The type of T (struct NS::A) carries two pieces of information:

It’s elaborated — the struct keyword appears.
It’s qualified — NS:: acts as a nested-name-specifier.

Here’s how the AST dump looked before this patch:

ElaboratedType 'struct NS::A' sugar
`-RecordType 'test::NS::A'
  `-CXXRecord 'A'

The RecordType represents a direct reference to the previously declared struct A — a kind of canonical view of the type, stripped of syntactic details like struct or namespace qualifiers.

Those syntactic details were stored separately in an ElaboratedType node that wrapped the RecordType.

Interestingly, an ElaboratedType node existed even when no elaboration or qualification appeared in the source (example). This was needed to distinguish between an explicitly unqualified type and one that lost its qualifiers through template substitution.

However, this design was expensive: every ElaboratedType node consumed 48 bytes, and creating one required extra work to uniquify it — an important step for Clang’s fast type comparisons.

A more compact representation

The new approach removes ElaboratedType entirely. Instead, elaboration and qualifiers are now stored directly inside RecordType.

The new AST dump for the same example looks like this:

RecordType 'struct NS::A' struct
|-NestedNameSpecifier Namespace 'NS'
`-CXXRecord 'A'

The struct elaboration now fits into previously unused bits within RecordType, while the qualifier is tail-allocated when present — making the node variably sized.

This change both shrinks the memory footprint and eliminates one level of indirection when traversing the AST.

Representing `NestedNameSpecifier`

NestedNameSpecifier is Clang’s internal representation for name qualifiers.

Before this patch, it was represented by a pointer (NestedNameSpecifier*) to a uniqued structure that could describe:

The global namespace (::)
A named namespace (including aliases)
A type
An identifier naming an unknown entity
A __super reference (Microsoft extension)

For all but cases (1) and (5), each NestedNameSpecifier also held a prefix — the qualifier to its left.

For example:

Namespace::Class::NestedClassTemplate<T>::XX

This would be stored as a linked list:

[id: XX] -> [type: NestedClassTemplate<T>] -> [type: Class] -> [namespace: Namespace]

Internally, that meant seven allocations totaling around 160 bytes:

NestedNameSpecifier (identifier) – 16 bytes
NestedNameSpecifier (type) – 16 bytes
TemplateSpecializationType – 48 bytes
QualifiedTemplateName – 16 bytes
NestedNameSpecifier (type) – 16 bytes
RecordType – 32 bytes
NestedNameSpecifier (namespace) – 16 bytes

The real problem wasn’t just size — it was the uniquing cost. Every prospective node has to be looked up in a hash table for a pre-existing instance.

To make matters worse, ElaboratedType nodes sometimes leaked into these chains, which wasn’t supposed to happen and led to several long-standing bugs.

A new, smarter `NestedNameSpecifier`

After this patch, NestedNameSpecifier becomes a compact, tagged pointer — just one machine word wide.

The pointer uses 8-byte alignment, leaving three spare bits. Two bits are used for kind discrimination, and one remains available for arbitrary use.

When non-null, the tag bits encode:

A type
A declaration (either a __super class or a namespace)
A namespace prefixed by the global scope (::Namespace)
A special object combining a namespace with its prefix

When null, the tag bits instead encode:

An empty nested name (the terminator)
The global name
An invalid/tombstone entry (for hash tables)

Other changes include:

The “unknown identifier” case is now represented by a DependentNameType.
Type prefixes are handled directly in the type hierarchy.

Revisiting the earlier example, after the patch its AST dump becomes:

DependentNameType 'Namespace::Class::NestedClassTemplate<T>::XX' dependent
`-NestedNameSpecifier TemplateSpecializationType 'Namespace::Class::NestedClassTemplate<T>' dependent
  `-name: 'Namespace::Class::NestedClassTemplate' qualified
    |-NestedNameSpecifier RecordType 'Namespace::Class'
    | |-NestedNameSpecifier Namespace 'Namespace'
    | `-CXXRecord 'Class'
    `-ClassTemplate NestedClassTemplate

This representation now requires only four allocations (156 bytes total):

DependentNameType – 48 bytes
TemplateSpecializationType – 48 bytes
QualifiedTemplateName – 16 bytes
RecordType – 40 bytes

That’s almost half the number of nodes.

While DependentNameType is larger than the previous 16-byte “identifier” node, the additional space isn’t wasted — it holds cached answers to common queries such as “does this type reference a template parameter?” or “what is its canonical form?”.

These caches make those operations significantly cheaper, further improving performance.

Wrapping up

There’s more in the patch than what I’ve covered here, including:

RecordType now points directly to the declaration found at creation, enriching the AST without measurable overhead.
RecordType nodes are now created lazily.
The redesigned NestedNameSpecifier simplified several template instantiation transforms.

Each of these could warrant its own write-up, but even this high-level overview shows how careful structural changes in the AST can lead to tangible compile-time wins.

I hope you found this deep dive into Clang’s internals interesting — and that it gives a glimpse of the kind of small, structural optimizations that add up to real performance improvements in large C++ builds.