对寄存器进行异或运算,与自身异或是将其清零的惯用方法。为什么不使用减法呢?
XOR'ing a register with itself is the idiom for zeroing it out. Why not sub?

原始链接: https://devblogs.microsoft.com/oldnewthing/20260421-00/?p=112247

马特·戈德博尔特的文章探讨了x86编译器为何偏爱`xor eax, eax`指令来清零寄存器,尽管`sub eax, eax`也能以相同的周期数实现相同的结果,甚至在标志位行为上更好。 关键在于紧凑性:`xor`在字节码中更短,因为它避免了编码常量。虽然`sub`效率相同,但一种历史性的“蜂拥”效应可能巩固了`xor`的主导地位。早期编译器对`xor`的使用影响了程序员,他们随后也更喜欢它,从而形成了一个反馈循环。 英特尔最终优化了这两个指令,检测到`xor r, r`和`sub r, r`,并通过路由到内部零寄存器有效地以零周期执行它们。然而,担心其他CPU制造商可能只会优化`xor`,进一步巩固了它的地位,尽管这种差异在很大程度上微不足道。这说明了即使是轻微的初始优势也可能导致编程实践中的广泛采用。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 用自身异或寄存器是将其清零的惯用语。为什么不使用减法? (devblogs.microsoft.com/oldnewthing) 10 分,作者 ingve 51 分钟前 | 隐藏 | 过去 | 收藏 | 1 条评论 帮助 nopurpose 3 分钟前 [–] 雷蒙德的文章常常能将最平凡的计算方面写得如此引人入胜,这让我感到惊奇。回复 考虑申请YC 2026年夏季项目!申请截止至5月4日 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系方式 搜索:
相关文章

原文

Matt Godbolt, probably best known for being the proprietor of Compiler Explorer, wrote a brief article on why x86 compilers love the xor eax, eax instruction.

The answer is that it is the most compact way to set a register to zero on x86. In particular, it is several bytes shorter than the more obvious mov eax, 0 since it avoids having to encode the four-byte constant. The x86 architecture does not have a dedicated zero register, so if you need to zero out a register, you’ll have to do it ab initio.

But Matt doesn’t explain why everyone chooses xor as opposed to some other mathematical operation that is guaranteed to result in a zero? In particular, what’s wrong with sub eax, eax? It encodes to the same number of bytes, executes in the same number of cycles. And its behavior with respect to flags is even better:

  xor eax, eax sub eax, eax
OF clear clear
SF clear clear
ZF set set
AF undefined clear
PF set set
CF clear clear

Observe that xor eax, eax leaves the AF flag undefined, whereas sub eax, eax clears it.

I don’t know why xor won the battle, but I suspect it was just a case of swarming.

In my hypothetical history, xor and sub started out with roughly similar popularity, but xor took a slightly lead due to some fluke, perhaps because it felt more “clever”.

When early compilers used xor to zero out a register, this started the snowball, because people would see the compiler generate xor and think, “Well, those compiler writes are smart, they must know something I don’t. Since I was on the fence between xor and sub, this tiny data point is enough to tip it toward xor.”

The predominance of these idioms as a way to zero out a register led Intel to add special xor r, r-detection and sub r, r-detection in the instruction decoding front-end and rename the destination to an internal zero register, bypassing the execution of the instruction entirely. You can imagine that the instruction, in some sense, “takes zero cycles to execute”. The front-end detection also breaks dependency chains: Normally, the output of an xor or sub is dependent on its inputs, but in this special case of xor‘ing or sub‘ing a register with itself, we know that the output is zero, independent of input.

Even though Intel added support for both xor-detection and sub-detection, Stack Overflow worries that other CPU manufacturers may have special-cased xor but not sub, so that makes xor the winner in this ultimately meaningless battle.

Once an instruction has an edge, even if only extremely slight, that’s enough to tip the scales and rally everyone to that side.

Bonus chatter: One of my former colleagues was partial to using sub r, r to zero a register, and when I was reading assembly code, I could tell that he was the author due to the use of sub to zero a register rather than the more popular xor.

Bonus bonus chatter: The xor trick doesn’t work for Itanium because mathematical operations don’t reset the NaT bit. Fortunately, Itanium also has a dedicated zero register, so you don’t need this trick. You can just move zero into your desired destination.

联系我们 contact @ memedata.com