![]() |
|
![]() |
| The computer did find the answers itself. I.e., it found "even integers" for P1, "{1,1}" for P2, and "2" for P6. It then also provided provided a Lean proof in each case. |
![]() |
| Huh, so MCTS to find the ‘best’ token using a (relatively) small, quick language model? Sounds like an interesting approach to small model text generation too… |
![]() |
| > I doubt they would claim to solve the problem if they provided half of the answer.
This falls under extraordinary claims require extraordinary proof and we have seen nothing of the sort. |
![]() |
| You'd think that, but Timothy Gowers (the famous mathematician they worked with) wrote (https://x.com/wtgowers/status/1816509817382735986)
> However, LLMs are not able to autoformalize reliably, so they got them to autoformalize each problem many times. Some of the formalizations were correct, but even the incorrect ones were useful as training data, as often they were easier problems. So didn't actually solve autoformalization, which is why they still needed humans to translate the input IMO 2024 problems. The reason why formalization is harder than you think is that there is no way to know if you got it right. You can use Reinforcement Learning with proofs and have a clear signal from the proof checker. We don't have a way to verify formalizations the same way. |
![]() |
| You write proofs in a formal language that can be machine checked. If the checker is happy, the proof is correct (unless there is a bug in the checker, but that is unlikely). |
![]() |
| They said the incorrectly formalized ones are usually easier, so I assume they just hire humans to solve them in the old way until the AI is smart enough to solve these easier problems. |
![]() |
| I think that's exagerating a bit. If you are familiar with both Lean and LaTeX then I think transcribing these problems to Lean only takes about twice as long as transcribing them to LaTeX. |
![]() |
| I as someone with a maths degree but who hasn't done this kind of thing for half a decade, was able to immediately guess the solution to (1) but actually proving it is much harder. |
![]() |
| Problem 2 and 5 have a lot of 1s. Sometimes thre is an interesting advance that is easirr than the full solution. Also in problem 6 the only hope for most was to get that 1 point. |
![]() |
| Fair enough. Although I think the question is whether P3/P5 were given zero points vs not evaluated vs evaluated but not published. I don't think it is surprising that Lean-verified proofs get a 7. |
![]() |
| One key difference between giving humans more time and giving computer programs more time is that historically we have had more success making the latter run faster than the former. |
![]() |
| Only slightly more general. It only works for games that are zero-sum, deterministic, have no hidden information, and discrete game state and moves. Other examples include connect-4. |
![]() |
| Theorem proving can be formulated as a game, see e.g., https://plato.stanford.edu/entries/logic-games/ and interactive theorem provers can verify that a proof is correct (and related sub problems, such as that a lemma application is valid).
Conceptually, if you're trying to show a conjunction, then it's the other player's turn and they ask you for a proof of a particular case. If you're trying to show a disjunction then it's your turn and you're picking a case. "Forall" is a potentially infinite conjunction, "exists" is a potentially infinite disjunction. In classical logic this collapses somewhat, but the point is that this is still a search problem of the same kind. If you want to feel this for yourself, try out some proofs in lean or coq. :) |
![]() |
| For some problems validation is expensive. Like the particle collider or space telescope, or testing the COVID vaccine. It's actually validation that is the bottleneck in search not ideation. |
![]() |
| What about ℕ? Seems pretty infinite to me, unless with "actually" you mean finite in time and space, which would make your argument a tautology. Or am I missing something? |
![]() |
| You mean that by improving search we can solve any problem? What if solution field is infinite, even if we make search algo 10x100 more performant, solution field will still be infinite, no? |
![]() |
| Can you give some context on how using Lean benefits?
In my understanding, proofs are usually harder to transcribe into Lean which is nobody _writes_ proofs using Lean. What is a nlinarith? |
![]() |
| I think you might be thinking of the recent project to start Fermat's Last Theorem? The Riemann hypothesis has been easy to state (given what's in Mathlib) for years. |
![]() |
| I'm seriously jealous of the people getting paid to work on this. Sounds great fun and must be incredibly satisfying to move the state of the art forward like that. |
![]() |
| It's funny that if I could describe my entire career, it would probably be something similar to software janitor/maintenance worker.
I guess I should have pursued a PhD when I was younger. |
![]() |
| Replace "this thing is better than Terrence Tao at maths" with "this thing is better than Garry Kasparov at chess" and your statement would sound equally reasonable in the early 90s. |
![]() |
| Because "Generally Intelligent" is a very broad and vague term.
"Better than Terrence Tao at solving certain formalized problems" (not necessarily equal to "Better that Terrence Tao at maths) isn't. |
![]() |
| They didn't formalize the entirety of math. Good thing imo doesn't need the entirety. But they didn't even formalize enough for imo--this is probably why combo wasn't solved |
![]() |
| Tangentially: I found it fascinating to follow along the solution to Problem 6: https://youtu.be/7h3gJfWnDoc (aquaesulian is a node to ancient name of Bath). There’s no advanced math and each step is quite simple, I’d guess on a medium 8th grader level.
Note that the 6th question is generally the hardest (“final boss”) and many top performers couldn’t solve it. I don’t know what Lean is or how see AI’s proofs but an AI system that can explain such a question on par with the YouTuber above would be fantastic! |
![]() |
| Don't confuse interpolation with extrapolation. Curing cancer will require new ideas. IMO requires skill proficiency in tasks where the methods of solving are known. |
![]() |
| The writing was on the wall for the last year and a half (in fact I lost a bet to an IMO medalist about AI getting IMO gold by 8/2023) but three years ago this was unimaginable. |
![]() |
| > They are building a general pipeline which turns informal natural lamguage mathematics
but this part currently sucks, because they didn't trust it and formalized problems manually. |
> First, the problems were manually translated into formal mathematical language for our systems to understand.
The non-geometry problems which were solved were all of the form "Determine all X such that…", and the resulting theorem statements are all of the form "We show that the set of all X is {foo}". The downloadable solutions from https://storage.googleapis.com/deepmind-media/DeepMind.com/B... don't make it clear whether the set {foo} was decided by a human during this translation step, or whether the computer found it. I want to believe that the computer found it, but I can't find anything to confirm. Anyone know?