
George Tsoukalas
@gtsoukal
Followers
256
Following
457
Media
0
Statuses
53
PhD student at UT Austin interested in automatic theorem proving.
Joined September 2022
RT @AmitayushThakur: 1/🧵Excited to share CLEVER — a new benchmark for end-to-end verified code generation in Lean. Can we go from natural l….
0
12
0
PutnamBench: A math benchmark where no reasoning model can solve even a single problem! We evaluated leading LRMs on the Lean 4 version🧵.
Announcing PutnamBench: an evaluation benchmark for formal mathematical reasoning in Lean 4, Isabelle, and Coq! PutnamBench consists of problems from the William-Lowell Putnam Mathematical Competition, the premier collegiate mathematics exam in the US & Canada. 🧵.
8
8
75