Hamed Mahdavi @HamedMahdavi93 X Profile

Hamed Mahdavi

@HamedMahdavi93

Followers

662

Following

4K

Media

27

Statuses

422

Ph.D. Student at Pennsylvania State University

Pennsylvania, USA

Joined April 2019

Don't wanna be here? Send us removal request.

Hamed Mahdavi

@HamedMahdavi93

2 months

(1/n) Until recently, even strong LLMs struggled with USAMO/IMO problems. This year, specific model variants from Google and OpenAI’s were reported to solve 5/6 IMO problems. In our recent work, we asked a relevant question: Can we grade proofs fairly with partial credit using

1

22

96

Pegah Mohammadipour

@pegahmdp

13 hours

🚀 Now in PRA! Lindblad dynamics normally demand huge depth. We run multiple shallow Lindblad simulations (Kraus form or dilated-Hamiltonians) and extrapolate. Yields polylog depth, Gevrey smoothness, and rigorous bias–variance guarantees. https://t.co/IudeayTt4W

0

1

2

Hamed Mahdavi

@HamedMahdavi93

2 days

What are the best data curation and synthetic data works you’ve seen this year at NeurIPS? Share it with me.

0

4

Hamed Mahdavi

@HamedMahdavi93

2 days

We arrived just 20 minutes before they closed the gates for our connecting flight.

0

1

Ali Behrouz

@behrouz_ali

2 days

We keep scaling model parameters by increasing width and stacking more layers, but what if the truly missing axes for continual learning are compression and stacking the learning process? Excited to share the full version of Nested Learning, a new paradigm for continual learning

28

150

965

Niloofar

@niloofar_mire

2 days

Had a blast talking about privacy and agentic AI at the @farairesearch alignment workshop! 1. Stop worrying about memorization as a privacy concern 2. Optimizing for math and coding tasks is NOT going to give us models that are better for *humans*! (See graph!) Slides ⬇️

2

11

212

Hamed Mahdavi

@HamedMahdavi93

2 days

Don't do best of n, do majority of bests! Follow this nice work by @AminRakhsha, @AmirKhasahmadi and @SoloGen.

Amin Rakhsha

@AminRakhsha

3 days

We are presenting our paper on test-time compute at #NeurIPS2025 🤔Running Best-of-N 1000 times and picking the most frequent answer works better than a single BoN. We make it cheap✨ Don't generate new outputs for each run. Sample with replacement from the existing ones! 🧵

0

1

12

Hamed Mahdavi

@HamedMahdavi93

2 days

It’s snowing on the East Coast right now, but I’ll be in San Diego soon for NeurIPS! I work on reasoning, synthetic data, and agentic workflows for reasoning. I’m open to jobs, internships, and collaborations. Always happy to chat, whether in person or via DM😎

0

6

Amin Rakhsha

@AminRakhsha

3 days

We are presenting our paper on test-time compute at #NeurIPS2025 🤔Running Best-of-N 1000 times and picking the most frequent answer works better than a single BoN. We make it cheap✨ Don't generate new outputs for each run. Sample with replacement from the existing ones! 🧵

1

10

22

Amir Khasahmadi

@AmirKhasahmadi

3 days

Our new paper on LLMs test-time computation! #Neurips2025 Majority-of-the-Bests (MoB) improves Best-of-N with negligible cpu cost. Check it out!

Amin Rakhsha

@AminRakhsha

3 days

We are presenting our paper on test-time compute at #NeurIPS2025 🤔Running Best-of-N 1000 times and picking the most frequent answer works better than a single BoN. We make it cheap✨ Don't generate new outputs for each run. Sample with replacement from the existing ones! 🧵

0

3

10

Aviral Kumar

@aviral_kumar2

8 days

🚨🚨New blog post led by CMU students: Want to know why LLM RL training plateaus on hard problems & scaling compute may not help? And how to fix this issue? Turns out it stems from a coupling of poor exploration & optimization. Classical ways to explore don't work, but ours

6

44

248

Hanna Hajishirzi

@HannaHajishirzi

16 days

Meet DRTulu: our open deep-research agent built for long-form, open-ended deep research tasks, trained with our new RLER method. DR Tulu rivals or is even better than proprietary deep research systems like Perplexity or OpenAI on several benchmarks.

Ai2

@allen_ai

16 days

Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚

2

8

38

Pushmeet Kohli

@pushmeet

22 days

Last year, AlphaProof & AlphaGeometry reached a key landmark in AI by achieving silver medal level performance at the International Math Olympiad. Today, @Nature is publishing the methodology behind our amazing agent AlphaProof! @GoogleDeepMind Paper:

nature.com

Nature - Olympiad-level formal mathematical reasoning with reinforcement learning

8

84

439

Aarash Feizi

@aarashfeizi

22 days

🚀 Announcing GroundCUA, a high-quality dataset for grounding computer-use agents. With over 3M expert annotations spanning 87 desktop apps, we use our new dataset to train state-of-the-art grounding models, namely GroundNext-3B and GroundNext-7B. 👇 Thread

5

31

81

Amirhossein Kazemnejad

@a_kazemnejad

22 days

Computer-use agents don’t touch the UI anymore; they do the high-level planning and call a "grounding" agent to click and type. @aarashfeizi et al. proposed a recipe to create to SOTA grounding agents: from the data-collection to RL pipeline design. Check it out.

Aarash Feizi

@aarashfeizi

22 days

🚀 Announcing GroundCUA, a high-quality dataset for grounding computer-use agents. With over 3M expert annotations spanning 87 desktop apps, we use our new dataset to train state-of-the-art grounding models, namely GroundNext-3B and GroundNext-7B. 👇 Thread

0

1

17

Hamed Mahdavi

@HamedMahdavi93

22 days

LLM-generated reviews be like: "The paper presents a self-driving car. However, a key limitation is that it does not fly."

0

8

Niloofar

@niloofar_mire

23 days

I'm really excited about our new paper!! 📣 'Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs' Contrary to belief that RL ft degrades memorized knowledge, RL-enhanced models consistently outperform base/SFT on knowledge recall by 24pp! RL teaches

18

50

421

Hamed Mahdavi

@HamedMahdavi93

25 days

@aviral_kumar2

1

2

23

Hamed Mahdavi

@HamedMahdavi93

25 days

This @aviral_kumar2 lecture is AMAZING.

1

29

290

Cody Blakeney

@code_star

26 days

She pivoted to midtraining research

Autism Capital 🧩

@AutismCapital

26 days

Whatever happened to Marie Kondo? Is she still sparking joy?

0

4

52

Pouria Mahdavinia

@pouria_mahdavi

27 days

Btw, I got this idea from James Martens' work, and I suggest reading his work for understanding how working optimizers for deep neural networks are developed: https://t.co/Oo4vZLY7cq. James Martens has also been a contributor to probably one of the first large NNs ever trained at

0

1

2