![]() |
|
![]() |
| >compilers behaving in a deterministic and predictable way is an important fundamental of pipelines.
LLMs are inherently unpredictable, and so using an LLM for compilation / decompilation -- even an LLM that has 99.99% accuracy
You're confusing different concepts here. An llm is technically not unpredictable by itself (at least the ones we are talking about here, there are different problems with beasts like GPT4 [1]). The "randomness" of llms you are probably experiencing stems from the autoregressive completion, which samples from probabilities for a temperature T>0 (which is very common because it makes sense in chat applications). But there is nothing that prevents you from simply choosing greedy sampling, which would make your output 100% deterministic and reproducible. That is particularly useful for disassembling/decompiling and has the chance to vastly improve over existing tools, because it is common knowledge that they are often not the sharpest tools and humans are much better at piecing together working code. The other question here is accuracy for compiling. For that it is important whether the llm can follow a specification correctly. Because once you write unspecified behaviour, your code is fair game for other compilers as well. So the real question is how well does it follow the spec how good is it at dealing with situations where normal compilers will flounder. [1] https://152334h.github.io/blog/non-determinism-in-gpt-4/ |
![]() |
| In parallel computing you run into nondeterminism pretty quickly anyways - especially with CUDA because of undetermined execution order and floating point accuracy. |
![]() |
| LLMs can be deterministic if you set the random seed and pin it to a certain version of the weights.
My bigger concern would be bugs in the machine code would be very, very difficult to track down. |
![]() |
| It is very important for a compiler to be deterministic. Otherwise you can't validate the integrity of binaries! We already have issues with reproducibility without adding this shit in the mix. |
![]() |
| I am curious about CUDA assembly, does it work on CUDA -> ptx level? or ptx -> sass? I have done some work on SASS optimization and it would be a lot easier if LLM could be applied at SASS level |
![]() |
| Thank you for freeing me from one of my to-do projects. I wanted to do a similar autoencoder with optimisations. Did you write about it anywhere? I'd love to read the details. |
![]() |
| then maybe dont name it "LLM Compiler", just "Compiler Guidance with LLMs" or "LLM-aided Compiler optimization" or something - will get much more to the point without overpromising |
![]() |
| As this LLM operates on LLVM intermediate representation language, the result can be fed into https://alive2.llvm.org/ce/ and formally verified. For those who don't know what to print there: here is an example of C++ spaceship operator: https://alive2.llvm.org/ce/z/YJPr84 (try to replace -1 with -2 there to break). This is kind of a Swiss knife for LLVM developers, they often start optimizations with this tool.
What they missed is to mention verification (they probably don't know about alive2) and comparison with other compilers. It is very likely that LLM Compiler "learned" from GCC and with huge computational effort simply generates what GCC can do out of the box. |
![]() |
| Yep! No GCC on this one. And yep, that's not far off how the pretraining data was gathered - but with random optimisations to give it a bit of variety. |
![]() |
| C++ has operator overloading, so you can define the spaceship for any class, and get every comparison operator from the fallback definitions, which use `<=>` in some obvious ways. |
![]() |
| > Presumably a very generalized model would be good at even doing the inverse: given some assembly, write code that will produce the given assembly.
ChatGPT does this, unreliably. |
![]() |
| Unlike many other AI-themed papers at Meta this one omits any mention of the model output getting used at Instagram, Facebook or Meta. Research is great! But doesn't seem all that actionable today. |
![]() |
| This would be difficult to deploy as-is in production.
There are correctness issues mentioned in the paper regarding adjusting phase orderings away from the well-trodden O0/O1/O2/O3/Os/Oz path. Their methodology works for a research project quite well, but I personally wouldn't trust it in production. While some obvious issues can be caught by a small test suite and unit tests, there are others that won't be, and that's really risky in production scenarios. There are also some practical software engineering things like deployment in the compiler. There is actually tooling in upstream LLVM to do this (https://www.youtube.com/watch?v=mQu1CLZ3uWs), but running models on a GPU would be difficult and I would expect CPU inference to massively blow up compile times. |
![]() |
| I don’t understand the purpose of this. Feels like a task for function calling and sending it to an actual compiler.
Is there an obvious use case I’m missing? |
![]() |
| GPT 6 can write software directly (as assembly) instead of writing c first.
Lots of training data for binary, and it can train itself by seeing if the program does what it expects it to do. |
![]() |
| Reading the title, I thought this was a tool for optimizing and disassembling LLMs, not an LLM designed to optimize and disassemble. Seeing it's just a model is a little disappointing in comparison. |
It's pretty important for compilers / decompilers to be reliable and accurate -- compilers behaving in a deterministic and predictable way is an important fundamental of pipelines.
LLMs are inherently unpredictable, and so using an LLM for compilation / decompilation -- even an LLM that has 99.99% accuracy -- feels a bit odd to include as a piece in my build pipeline.
That said, let's look at the paper and see what they did.
They essentially started with CodeLlama, and then went further to train the model on three tasks -- one primary, and two downstream.
The first task is compilation: given input code and a set of compiler flags, can we predict the output assembly? Given the inability to verify correctness without using a traditional compiler, this feels like it's of limited use on its own. However, training a model on this as a primary task enables a couple of downstream tasks. Namely:
The second task (and first downstream task) is compiler flag prediction / optimization to predict / optimize for smaller assembly sizes. It's a bit disappointing that they only seem to be able to optimize for assembly size (and not execution speed), but it's not without its uses. Because the output of this task (compiler flags) are then passed to a deterministic function (a traditional compiler), then the instability of the LLM is mitigated.
The third task (second downstream task) is decompilation. This is not the first time that LLMs have been trained to do better decompilation -- however, because of the pretraining that they did on the primary task, they feel that this provides some advantages over previous approaches. Sadly, they only compare LLM Compiler to Code Llama and GPT-4 Turbo, and not against any other LLMs fine-tuned for the decompilation task, so it's difficult to see in context how much better their approach is.
Regarding the verifiability of the disassembly approach, the authors note that there are issues regarding correctness. So the authors employ round-tripping -- recompiling the decompiled code (using the same compiler flags) to verify correctness / exact-match. This still puts accuracy in the 45% or so (if I understand their output numbers), so it's not entirely trustworthy yet, but it might be able to still be useful (especially if used alongside a traditional decompiler, and this model's outputs only used when they are verifiably correct).
Overall I'm happy to see this model be released as it seems like an interesting use-case. I may need to read more, but at first blush I'm not immediately excited by the possibilities that this unlocks. Most of all, I would like to see it explored if these methods could be extended to optimize for performance -- not just size of assembly.