![]() |
|
![]() |
|
I very much agree with that especially since - like you said - code that looks like it will perform well, not always does. That being said I'd like to add that in my opinion performance measurement results should not be the only guiding principle. You said it yourself: "Another reason is that the CPUs of today are optimized [..]" The important word is "today". CPUs evolved and still do and a calling convention should be designed for the long term. Sadly, it means that it is beneficial to not deviate too much from what C++ does [1], because it is likely that future processor optimizations will be targeted in that direction. Apart from that it might be worthwhile to consider general principles that are not likely to change (e.g. conserve argument registers, as you mentioned), to make the calling convention robust and future proof. [1] It feels a bit strange, when I say that because I think Rust has become a bit too conservative in recent years, when it comes to its weirdness budget (https://steveklabnik.com/writing/the-language-strangeness-bu...). You cannot be better without being different, after all. |
![]() |
|
There were proposals for optimizing this kind of stuff for C++ in particular for error handling, like: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p07... > Throwing such values behaves as-if the function returned union{R;E;}+bool where on success the function returns the normal return value R and on error the function returns the error value type E, both in the same return channel including using the same registers. The discriminant can use an unused CPU flag or a register |
![]() |
|
The LLVM calling conventions for x86 only allow returning 3 integer registers, 4 vector registers, and 2 x87 floating point registers (er, stack slots technically because x87 is weird).
|
![]() |
|
Yeah, `NonZero*` but also a type like `#[repr(u8)] enum Foo{ X }`,
according to `assert_eq!(std::mem::size_of::
|
![]() |
|
Also, most modern processors will easily forward the store to the subsequent read and has a bunch of tricks for tracking the stack state. So much does putting things in registers help anyway?
|
![]() |
|
Reminds me of Fortran compilers recognising the naive three-nested-loops matrix multiplication and optimising it to something sensible.
|
![]() |
|
Forwarding isn't unlimited, though, as I understand it. The CPU has limited-size queues and buffers through which reordering, forwarding, etc. can happen. So I wouldn't be surprised if using registers well takes pressure off of that machinery and ensures that it works as you expect for the data that isn't in registers. (Looked around randomly to find example data for this) https://chipsandcheese.com/2022/11/08/amds-zen-4-part-2-memo... claims that Zen 4's store queue only holds 64 entries, for example, and a 512-bit register store eats up two. I can imagine how an algorithm could fill that queue up by juggling enough data. |
![]() |
|
Tangentially related, there's another "unfortunate" detail of Rust that makes some structs bigger than you want them to be. Imagine a struct Foo that contains eight `Option Why? The reason is that structs must be able to present borrows of their fields, so given a `&Foo` the compiler must allow the construction of a `&Foo::some_field`, which in this case is an `&Option This becomes even worse if you consider Options of larger types, like a struct with eight `Option You *can* implement the C equivalent manually of course, with a `u8` for the packed discriminants and eight `MaybeUninit https://play.rust-lang.org/?version=stable&mode=debug&editio... |
![]() |
|
I just spent a bunch of time on inspect element trying to figure out how the section headings are set at an angle and (at least with Safari tools), I’m stumped. So how did he do this?
|
![]() |
|
In contrast: "How Swift Achieved Dynamic Linking Where Rust Couldn't " (2019) [1] On the one hand I'm disappointed that Rust still doesn't have a calling convention for Rust-level semantics. On the other hand the above article demonstrates the tremendous amount of work that's required to get there. Apple was deeply motivated to build this as a requirement to make Swift a viable system language that applications could rely on, but Rust does not have that kind of backing. [1] https://faultlore.com/blah/swift-abi/ HN discussion: https://news.ycombinator.com/item?id=21488415 |
![]() |
|
Yes, you can use CGO to call Rust functions using extern "C" FFI. I gave a talk about how we use it for GitHub code search at RustConf 2023 (https://www.youtube.com/watch?v=KYdlqhb267c) and afterwards I talked to some other folks (like 1Password) who are doing similar things. It's not a lot of fun because moving types across the C interop boundary is tedious, but it is possible and allows code reuse. |
![]() |
|
If you want to call from Go into Rust, you can declare any Rust function as `extern "C"` and then call it the same way you would call C from Go. Not sure about going the other way.
|
![]() |
|
That's not the direction being talked about here. Try calling the C# method from C or C++ or Rust. (I somewhat recently did try setting up mono to be able to do this... it wasn't fun.) |
![]() |
|
Nope, because that is a library class without any language support. The pedantic comment is a synonymous with proper education instead of street urban myths. |
![]() |
|
It is a library class, because C++ is a rich enough language to implement automatic refcounting as a library class, by hooking into the appropriate lifecycle methods (copy ctor, dtor).
|
![]() |
|
You have to go through C bindings, but FFI is very far from being Go's strongest suit (if we don't count Cgo), so if that's what interests you, it might be better to explore a different language.
|
![]() |
|
Delphi, and I'm sure others, have had[1] this for ages: When you declare a procedure or function, you can specify a calling convention using one of the directives register, pascal, cdecl, stdcall, safecall, and winapi. As in your example, cdecl is for calling C code, while stdcall/winapi on Windows for calling Windows APIs. [1]: https://docwiki.embarcadero.com/RADStudio/Sydney/en/Procedur... |
![]() |
|
Guess so. Unfamiliar with Zig. The point is that not a "all or nothing" strategy for a compilation unit. Debugger writers may not be happy, but maybe lldb supports all conventions supported by llvm. |
![]() |
|
interesting website - the title text is slanted. Sometimes people who dig deep into the technical details end up being creative with those details. |
![]() |
|
> Well, not even that (struct returns ... nope). C compilers actually pack small struct return values into registers: https://godbolt.org/z/51q5se86s It's just limited that on x86-64, GCC and Clang use up to two registers while MSVC only uses one. Also, IMHO there is no such thing as a "C calling convention", there are many different calling conventions that are defined by the various runtime environments (usually the combination of CPU architecture and operating system). C compilers just must adhere to those CPU+OS calling conventions like any other language that wants to interact directly with the operating system. IMHO the whole performance angle is a bit overblown though, for 'high frequency functions' the compiler should inline the function body anyway. And for situations where that's not possible (e.g. calling into DLLs), the DLL should expose an API that doesn't require such 'high frequency functions' in the first place. |
![]() |
|
Clang still has some alternative calling conventions via __attribute__((X)) for individual functions with a bunch of options[0], though none just extend the set of arguments passed via GPRs (closest seems to be preserve_none with 12 arguments passed by register, but it also unconditionally gets rid of all callee-saved registers; preserve_most is nice for rarely-taken paths, though until clang-17 it was broken on functions which returned things). [0]: https://clang.llvm.org/docs/AttributeReference.html#calling-... |
Sometimes, what the author calls bad code is actually the fastest thing you can do for totally not obvious reasons. The only way to find out is to measure the performance on some large benchmark.
One reason why sometimes bad looking calling conventions perform well is just that they conserve argument registers, which makes the register allocator’s life a tad easier.
Another reason is that the CPUs of today are optimized on traces of instructions generated by C compilers. If you generate code that looks like what the C compiler would do - which passes on the stack surprisingly often, especially if you’re MSVC - then you hit the CPU’s sweet spot somehow.
Another reason is that inlining is so successful, so calls are a kind of unusual boundary on the hot path. It’s fine to have some jank on that boundary if it makes other things simpler.
Not saying that the changes done here are bad, but I am saying that it’s weird to just talk about what looks like weird code without measuring.
(Source: I optimized calling conventions for a living when I worked on JavaScriptCore. I also optimized other things too but calling conventions are quite dear to my heart. It was surprising how often bad-looking pass-on-the-stack code won on big, real code. Weird but true.)