解构汇编代码中的函数

解构汇编代码中的函数
When functions dissolve (2020)

原始链接: https://rubber-duck-typing.com/posts/2020-12-12-when-functions-dissolve.html

## 尾调用优化总结尾调用发生在函数的最后一个动作是调用另一个函数并直接返回其结果时。这与函数在调用函数返回*之后*进行计算的情况不同。重要的是，尾调用提供了优化的潜力。通常，每个函数调用都会在调用栈上添加一个新的返回地址。然而，在尾调用中，被调用函数返回后，当前函数没有进一步的计算需要执行。因此，当前函数的返回地址是不必要的。尾调用可以优化为简单的跳转到下一个函数，而不是将另一个返回地址压入栈中。这有效地将函数调用替换为分支，避免栈增长并提高性能。示例说明了`print_newline`如何可以直接跳转到`print_char`，从而无需从`print_newline`单独返回，并简化执行。这种优化对于递归函数尤其有价值。

一个黑客新闻的讨论围绕着文章“当函数消解”（rubber-duck-typing.com），重点是尾调用优化和协程。最初的帖子引发了一场关于协程定义的讨论。一位评论者正确地指出，协程不是像子例程那样*调用*，而是被*恢复*执行，并且可能将状态作为参数传递。这与子例程具有多个入口点形成对比，而多个入口点不一定意味着是协程。讨论强调了协程的历史背景，追溯到梅尔文·康威在1963年的定义，并将它们与FORTRAN和PL/I等语言中早期的多入口子例程或线程区分开来。简短的评论也暗示了在组件级别强制模块化的可能性。

原文

Tail call happens when the subroutine ends with a call to another subroutine. In the higher level language this happens in cases such as:

void g();

void f() {
    g();
    return;
}

This is also a tail call, because we just return what another function returns:

int g(int x);

int f() {
    return g(42);
}

This is not a tail call: after calling function we have to multiply its result to another number. We wait until fact(n-1) completes its execution, and then use its result in other computations. Note that in this example the function calls itself rather than some other function, but this is unimportant.

int fact( int n ) {
  if (n < 2) return 1;
  return n * fact(n-1);
}

On the assembly level, a tail call corresponds to the following pattern of instructions:

This pair of instructions is located in a function which we are going to call f. The control reaches this function from some other function, say, f_caller.

Let us follow the state of stack through the execution of f_caller, f and other. For that we expand this code to include f_caller and other functions.

f_caller:

...
   call f
   <next instruction> ; mark 4
...

f:            ; mark 1
  ...
  call other
  ret         ; mark 3
...
other:        ; mark 2
   ...
   ret

When we start executing f the stack holds the return address inside f_caller (we reach mark 1).
When we call other the stack holds the return addresses for f_caller, then f on top (we reach mark 2).
The subroutine other returns too, in this moment we have only return address for f_caller on top of the stack (we reach mark 3).
The subroutine f returns, popping return address of f_caller from the stack (we reach mark 4).

The last two steps were essentially popping two return addresses from stack consecutively. It suggests that the first one (the return address to f) is useless and we do not need to store it. We are indeed right.

The call other instruction is equivalent to the pair of pseudoinstructions:

push rip    ; rip is the program counter register
jmp other

If we do not need to push return address to f, then we can just substitute this instruction for jmp and get rid of one ret:

f_caller:

...
   call f
   <next instruction>
...

f:
  ...
  jmp other
...
other:
   ...
   ret     ; will return directly to f_caller

The subroutine other becomes essentially the continuation of the subroutine f; we pass from f to other via a simple branching instruction.

Coming back to our example, we can rewrite it like that:

print_char:
   ...
   ret

print_newline:
    mov rdi, 0xA          ; code
    jmp print_char

解构汇编代码中的函数 When functions dissolve (2020)

解构汇编代码中的函数
When functions dissolve (2020)