Jiri Gesi ✈️NeurIPS ✈️ @JIRIGESI X Profile

Jiri Gesi ✈️NeurIPS ✈️

@JIRIGESI

Followers

175

Following

656

Media

10

Statuses

144

Post training @amazon, previous @UCIrvine

https://t.co/S6htUmLczj

Joined September 2015

Don't wanna be here? Send us removal request.

Dakuo Wang

@dakuowang

6 days

Our team will be giving a demo on “LLM Agent as Digital Twins of Online Shopping Customers” at #NeurIPS2025 . This is a collaboration between @amazon and @Northeastern human-centered AI lab. We are actively hiring PhD, postdoc, interns. Stop by the Amazon booth tomorrow

2

4

16

Jiri Gesi ✈️NeurIPS ✈️

@JIRIGESI

12 days

I’ll be at NeurIPS, if you’re interested in a 2026 PhD research internship with Amazon Store Foundation AI and want to work on agents, RL, and multi-modal, I’d love to connect at the conference.

3

1

17

Andrej Karpathy

@karpathy

2 months

Hah judging by mentions overnight people seem to find the ghost analogy provocative. I swear I don't wake up just trying to come with new memes but to elaborate briefly why I thought it was a fun comparison: 1) It captures the idea that LLMs are purely digital artifacts that

88

79

1K

Bohan Lyu

@Lyubh22

2 months

Building upon Goedel-Prover-V2, Hilbert Prover achieved 99.2% on Minif2f and solved over 70% PutnamBench problems😱 Amazing news from my old home @yuqirose's lab. At ICML this year, someone asked why the model struggled with Putnam problems. I said it was a matter of time, and

0

4

17

Jiri Gesi ✈️NeurIPS ✈️

@JIRIGESI

2 months

🪢 Careful SFT + 🧩 token-adaptive weighting helps avoid catastrophic forgetting 🧠

Jiacheng Lin

@jclin808

2 months

📉SFT might not suffer as much catastrophic forgetting as you think. Lately, much debate around GRPO in the community. RL is hot—but let’s not forget, in the context of LLMs: SFT is the bedrock of almost all RL. Also, there’s still a lot we don’t fully understand about SFT.

0

Yong Lin

@Yong18850571

4 months

The report of Goedel-Prover-V2 is on arXiv now https://t.co/yROjbJMVgP . Check out the details on self-correction, large scale scaffolded data sythesis framework, and the magical model averaging.

9

107

308

Chi Jin

@chijinML

4 months

Many friends still ask me about AI for IMO, formal vs informal math. Some quick thoughts: IMO results: GDM and OpenAI achieved gold using informal (natural language) methods. ByteDance and AlphaProof (last year) got gold/silver using formal methods (Lean + specialized geometry

12

40

372

Princeton Computer Science

@PrincetonCS

5 months

⏱️AI is making verification process easier, with models verifying proofs in minutes. 💻 Now, @prfsanjeevarora, @chijinML, @danqi_chen and @PrincetonPLI have released Goedel Prover V2, a model more efficient and more accurate than any previous model. 👉 https://t.co/v7500VNytz

1

21

96

WebAgentlab

@webagentlab

5 months

Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning The paper introduces Shop-R1, a reinforcement learning framework that significantly enhances the simulation of realistic online shopping behaviors using Large Language Models by

1

2

Chi Jin

@chijinML

5 months

While IMO is trending, our model leads on college-level math (Putnam Benchmark)—nearly doubling the problems solved by prior SOTA, with formal, verifiable proofs! Moreover, it’s not just an announcement—you can actually download and use our model. 🙂

Yong Lin

@Yong18850571

5 months

🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using: * 1/20 the model size (32B vs. 671B) * 1/5 the passes (184 vs. 1024) Meanwhile, we also release *

4

23

168

Chi Jin

@chijinML

5 months

Congrats! As a scientist/mathematician trained to verify things rigorously, I'm curious—will we get to see a bit more than tweets and final outputs (e.g., how they were generated/selected) to verify the claims? 🙂

Alexander Wei

@alexwei_

5 months

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

4

2

106

Chi Jin

@chijinML

5 months

I will also give a talk about theorem proving and Goedel-prover V2 at 12:45 today at @ai4mathworkshop . Drop by our talk and poster if you are at ICML!

Bohan Lyu

@Lyubh22

5 months

Goedel Prover V2 ( https://t.co/Xewuj90yGf) will be featured at @ai4mathworkshop today. Come and discuss with us!

0

8

30

Jiri Gesi ✈️NeurIPS ✈️

@JIRIGESI

5 months

Shot out for the best theorem prover model to date!

Yong Lin

@Yong18850571

5 months

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B

0

DeepSeek

@deepseek_ai

11 months

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at https://t.co/v1TFy7LHNy today! 🐋 1/n

2K

7K

36K

Qwen

@Alibaba_Qwen

1 year

Qwen2.5 Technical Report https://t.co/09b9WvA9pY

13

236

1K

Conference on Language Modeling

@COLM_conf

1 year

Announcement #1: our call for papers is up! 🎉 https://t.co/o8Mv1ywQwZ And excited to announce the COLM 2025 program chairs @yoavartzi @eunsolc @RanjayKrishna and @AdtRaghunathan

1

43

163

Nathan Lambert

@natolambert

1 year

First slide deck for NeurIPS is done -- a short overview of how I view post-training for applications. A higher level summary on the key decisions along the way of scoping a problem, choosing a base model, optimization algorithm, etc. (Plus some thoughts on OpenAI's RL

4

31

285

Graham Neubig

@gneubig

1 year

We are now done with all classes for CMU CS11-711 Advanced NLP! Slides: https://t.co/zY0CRx4NVw Videos: https://t.co/FZt0FLv6v4 Hope this is useful to people 😀

youtube.com

Videos for Carnegie Mellon University's CS 11-711 Advanced NLP by Graham Neubig. Class Site: https://phontron.com/class/anlp-fall2024/

Graham Neubig

@gneubig

1 year

We started the Fall 2024 version of CMU CS11-711 Advanced NLP🎓 Follow along to learn about the latest in NLP, LLMs, Agents, etc. * Materials: https://t.co/LETEcVsBJl * Videos:

6

91

480

Stanford NLP Group

@stanfordnlp

1 year

Great article about our newest @stanfordnlp faculty member @Diyi_Yang in @Stanford Report: “I am passionate about developing a future where humans and AIs can collaborate to achieve greater collective intelligence in a variety of contexts, education, healthcare, & the workplace”

4

35

222

Shunyu Yao

@ShunyuYao12

1 year

Had a fun time delivering language agent tutorial ( https://t.co/UlDj9S4BfC) with @ysu_nlp @Diyi_Yang @taoyds @emnlpmeeting ! Thanks for joining and asking good qs!

5

16

219