George Tsoukalas
@gtsoukal
Followers
517
Following
554
Media
3
Statuses
89
AlphaEvolve @ DeepMind, AI for Math @ UT Austin.
Joined September 2022
Big thanks to @axiommathai for kindly contributing the formalized Putnam 2025 problem statements to PutnamBench! All 12 statements are now available publicly!
1
3
51
The 2025 Putnam Competition is on Saturday! Excited to see how the new models from AI4Math companies fare on these new, uncontaminated problems! We will be sure to add them to PutnamBench!
0
1
24
Will be presenting CLEVER at #NeurIPS2025 (San Diego) on December 3rd 4:30 PM in Exhibit Hall C D E, Poster location 1411. If you are interested in verified code generation you please visit our CLEVER poster.
1/🧵Excited to share CLEVER — a new benchmark for end-to-end verified code generation in Lean. Can we go from natural language to a formally verified Lean program? CLEVER puts this to the test. 📄 https://t.co/oXa2iNFJE0 💻 https://t.co/YhW8GDKlZG
1
3
8
New leader on the PutnamBench leaderboard! Getting close to saturation now, next big target will be optimizing cost for the same proving performance! Congrats to the Logical Intelligence team!
Our Aleph prover agent just hit #1 on PutnamBench, a benchmark built from Putnam problems - one the hardest college-level math olympiad - fully formalized with machine-checked proofs and no human involvement. Putnam problems are often considered harder than IMO problems and span
0
5
30
Our Aleph prover agent just hit #1 on PutnamBench, a benchmark built from Putnam problems - one the hardest college-level math olympiad - fully formalized with machine-checked proofs and no human involvement. Putnam problems are often considered harder than IMO problems and span
5
35
173
In San Diego for #NeurIPS2025 from Dec. 1 to 8. Reach out if you want to chat about AI for math and science!
1
0
10
We'll be presenting it at NeurIPS 2025 in San Diego next month, where it was awarded a spotlight presentation! Happy to chat more about it, please reach out if interested!
1
0
5
Our paper is available at https://t.co/yRV3vlSJCA (and hopefully arxiv soon!). This work was done with my fantastic collaborators Rahul Saha @rah4927, Amitayush Thakur @AmitayushThakur, Sabrina Reguyal, and Swarat Chaudhuri @swarat. It wouldn't have been possible without them!
2
0
7
We carry out all our experiments in Fermat, an open-sourced environment for mathematical theory exploration available here: https://t.co/BrOxfDMAN8.
github.com
Contribute to trishullab/Fermat development by creating an account on GitHub.
1
1
4
We do our experiments in elementary number theory & finite fields, and find that we can produce interestingness functions better than the base heuristics available in HR. Our approach can recover some interesting concepts in number theory, like primality, but can't yet find FLT.
1
0
2
The numerical reward is attached to the interestingness function and used inside the FunSearch-like loop.
1
0
2
The idea being that if a system can find many concepts that humans find interesting, it may be able to find concepts that we haven't considered but could be very valuable.
1
0
2
We measure the value of an interestingness function by sampling trajectories with the policy, and checking the resulting generated theories against a ground truth set of human-made concepts that forms the interestingness signal.
1
0
2
Given an interestingness function, we use it to form a policy which explores mathematical theory-space by selecting actions that produce concepts & conjectures, in an RL environment we call Fermat that we open source!
1
0
2
In our work, we focus on learning how to generate a function which measures the interestingness of mathematical objects. In particular, we view this function as living in the space of programs, and design a FunSearch variant to optimize it.
1
0
2
For example, making a system that can prove Fermat's Last Theorem seems difficult, but what about a system that can *find* its statement? Which tells you that it is interesting? How could that be done?
1
1
3
Systems like HR (2000), AM (1976), Graffiti (1988) synthesize new concepts & conjectures. Centrally, learning to come up with the right concepts that captured what humans thought were interesting, was a challenging issue.
1
0
3