将 Ruby 编译为机器语言

将 Ruby 编译为机器语言
Compiling Ruby to machine language

原始链接: https://patshaughnessy.net/2025/11/17/compiling-ruby-to-machine-language

## Ruby 3.x 性能：YJIT 与 ZJIT 简介正在开发新一期的 *Ruby Under a Microscope*，重点关注 Ruby 3.x 及其性能增强，特别是 YJIT 和 ZJIT JIT 编译器。这些编译器旨在通过将频繁执行的部分转换为机器语言来加速 Ruby 代码。 YJIT 通过监控函数和块的调用频率来工作。一旦达到阈值（小型程序默认值为 30，大型程序为 120），YJIT 就会将该代码编译成优化的“YJIT 块”——机器语言指令序列。它最初编译小节，使用“分支桩”来处理数据类型未知的情况，并观察运行时行为以专门化编译后的代码。下一代 ZJIT 在此基础上构建。YJIT 和 ZJIT 都依赖于计数方法和块的调用，以识别用于编译的“热点”。作者正在探索这些编译器的内部工作原理，包括检查生成的机器代码以及利用 Rust 技能，强调 Shopify 和其他 Ruby 团队的令人印象深刻的贡献。这项工作旨在提供对 Ruby 代码如何转换以提高运行时性能的详细了解。

## Ruby 编译与性能讨论一则黑客新闻讨论围绕着将 Ruby 编译成机器语言，起因是 Pat Shaughnessy 的 “Ruby Under a Microscope” 即将发布新版。对话涉及过去的尝试，如 MacRuby（使用 LLVM 但被 Swift 取代）和 RubyMotion，以及当前的 JIT（即时编译）努力，如 YJIT 和 ZJIT。用户们争论 Ruby 相对于其他语言的性能，指出它比 Python 快，但通常比 Lua 和 Node.js 等语言慢。尽管 Ruby 在过去十年中速度有了显著提高，但其整体性能排名仍然相对稳定。讨论还强调了 Ruby 编译的挑战，包括内存消耗以及优化动态类型语言的复杂性。许多评论员赞扬 Shaughnessy 的工作，使其更容易理解 Ruby 内部机制，甚至有用户提到一个个人项目，正在构建 AOT（提前编译）Ruby 编译器。最终，共识是 Ruby 对于许多应用来说“足够快”，但仍有很大的性能提升潜力。

原文

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished.

Here’s an excerpt from the completely new content for Chapter 4, about YJIT and ZJIT. I’m still finishing this up… so this content is fresh off the page! It’s been a lot of fun for me to learn about how JIT compilers work and to brush up on my Rust skills as well. And it’s very exciting to see all the impressive work the Ruby team at Shopify and other contributors have done to improve Ruby’s runtime performance.

Chapter 4: Compiling Ruby To Machine Language

Interpreting vs. Compiling Ruby Code	4
Yet Another JIT (YJIT)	6
Virtual Machines and Actual Machines	6
Counting Method and Block Calls	8
YJIT Blocks	8
YJIT Branch Stubs	10
Executing YJIT Blocks and Branches	11
Deferred Compilation	12
Regenerating a YJIT Branch	12
YJIT Guards	14
Adding Two Integers Using Machine Language	15
Experiment 4-1: Which Code Does YJIT Optimize?	18
How YJIT Recompiles Code	22
Finding a Block Version	22
Saving Multiple Block Versions	24
ZJIT, Ruby’s Next Generation JIT	26
Counting Method and Block Calls	27
ZJIT Blocks	29
Method Based JIT	31
Rust Inside of Ruby	33
Experiment 4-2: Reading ZJIT HIR and LIR	35
Summary	37

Counting Method and Block Calls

To find hot spots, YJIT counts how many times your program calls each function or block. When this count reaches a certain threshold, YJIT stops your program and converts that section of code into machine language. Later Ruby will execute the machine language version instead of the original YARV instructions.

To keep track of these counts, YJIT saves an internal counter nearby the YARV instruction sequence for each function or block.

Figure 4-5 shows the YARV instruction sequence the main Ruby compiler created for the sum += i block at (3) in Listing 4-1. At the top, above the YARV instructions, Figure 4-5 shows two YJIT related values: jit_entry and jit_entry_calls. As we’ll see in a moment, jit_entry starts as a null value but will later hold a pointer to the machine language instructions YJIT produces for this Ruby block. Below jit_entry, Figure 4-5 also shows jit_entry_calls, YJIT’s internal counter.

Each time the program in Listing 4-1 calls this block, YJIT increments the value of jit_entry_calls. Since the range at (1) in Listing 4-1 spans from 1 through 40, this counter will start at zero and increase by 1 each time Range#each calls the block at (3).

When the jit_entry_calls reaches a particular threshold, YJIT will compile the YARV instructions into machine language. By default for small Ruby programs YJIT in Ruby 3.5 uses a threshold of 30. Larger programs, like Ruby on Rails web applications, will use a larger threshold value of 120. (You can also change the threshold by passing —yjit-call-threshold when you run your Ruby program.)

YJIT Blocks

While compiling your Ruby program, YJIT saves the machine language instructions it creates into YJIT blocks. YJIT blocks, which are distinct from Ruby blocks, each contain a sequence of machine language instructions for a range of corresponding YARV instructions. By grouping YARV instructions and compiling each group into a YJIT block, YJIT can produce more optimized code that is tailored to your program’s behavior and avoid compiling code that your program doesn’t need.

As we’ll see next, a single YJIT block doesn’t correspond to a Ruby function or block. YJIT blocks instead represent smaller sections of code: individual YARV instructions or a small range of YARV instructions. Each Ruby function or block typically consists of several YJIT blocks.

Let’s see how this works for our example. After the program in Listing 4-1 executes the Ruby block at (3) 29 times, YJIT will increment the jit_entry_calls counter again, just before Ruby runs the block for the 30th time. Since jit_entry_calls reaches the threshold value of 30, YJIT triggers the compilation process.

YJIT compiles the first YARV instruction getlocal_WC_1 and saves machine language instructions that perform the same work as getlocal_WC_1 into a new YJIT block:

On the left side, Figure 4-6 shows the YARV instructions for the sum += i Ruby block. On the right, Figure 4-6 shows the new YJIT block corresponding to getlocal_WC_1.

Next, the YJIT compiler continues and compiles the second YARV instruction from the left side of Figure 4-7: getlocal_WC_0 at index 2.

On the left side, Figure 4-7 shows the same YARV instructions for the sum += i Ruby block that we saw above in Figure 4-6. But now the two dotted arrows indicate that the YJIT block on the right contains the machine language instructions equivalent to both getlocal_WC_1 and getlocal_WC_0.

Let’s take a look inside this new block. YJIT compiles or translates the Ruby YARV instructions into machine language instructions. In this example, running on my Mac laptop, YJIT writes the following machine language instructions into this new block:

Figure 4-8 shows a closer view of the new YJIT block that appeared on the right side of Figures 4-6 and 4-7. Inside the block, Figure 4-8 shows the assembly language acronyms corresponding to the ARM64 machine language instructions that YJIT generated for the two YARV instructions shown on the left. The YARV instructions on the left are: getlocal_WC_1, which loads a value from a local variable located in the previous stack frame and saves it on the YARV stack, and getlocal_WC_0, which loads a local variable from the current stack from and also saves it on the YARV stack. The machine language instructions on the right side of Figure 4-8 perform the same task, loading these values into registers on my M1 microprocessor: x1 and x9. If you’re curious and would like to learn more about what the machine language instructions mean and how they work, the section “Adding Two Integers Using Machine Language” discusses the instructions for this example in more detail.

YJIT Branch Stubs

Next, YJIT continues down the sequence of YARV instructions and compiles the opt_plus YARV instruction at index 4 in Figures 4-6 and 4-7. But this time, YJIT runs into a problem: It doesn’t know the type of the addition arguments. That is, will opt_plus add two integers? Or two strings, floating point numbers, or some other types?

Machine language is very specific. To add two 64-bit integers on an M1 microprocessor, YJIT could use the adds assembly language instruction. But adding two floating pointer numbers would require different instructions. And, of course, adding or concatenating two strings is an entirely different operation altogether.

In order for YJIT to know which machine language instructions to save into the YJIT block for opt_plus, YJIT needs to know exactly what type of values the Ruby program might ever add at (3) in Listing 4-1. You and I can tell by reading Listing 4-1 that the Ruby code is adding integers. We know right away that the sum += 1 block at (3) is always adding one integer to another. But YJIT doesn’t know this.

YJIT uses a clever trick to solve this problem. Instead of analyzing the entire program ahead of time to determine all of the possible types of values the opt_plus YARV instruction might ever need to add, YJIT simply waits until the block runs and observes which types the program actually passes in.

YJIT uses branch stubs to achieve this wait-and-see compile behavior, as shown in Figure 4-9.

Figure 4-9 shows the YARV instructions on the left, and the YJIT block for indexes 0000-0002 on the right. But note the bottom right corner of Figure 4-7, which shows an arrow pointing down from the block to a box labeled stub. This arrow represents a YJIT branch. Since this new branch doesn’t point to a block yet, YJIT sets up the branch to point to a branch stub instead.