Hamed Mahdavi Profile
Hamed Mahdavi

@HamedMahdavi93

Followers
662
Following
4K
Media
27
Statuses
422

Ph.D. Student at Pennsylvania State University

Pennsylvania, USA
Joined April 2019
Don't wanna be here? Send us removal request.
@HamedMahdavi93
Hamed Mahdavi
2 months
(1/n) Until recently, even strong LLMs struggled with USAMO/IMO problems. This year, specific model variants from Google and OpenAI’s were reported to solve 5/6 IMO problems. In our recent work, we asked a relevant question: Can we grade proofs fairly with partial credit using
1
22
96
@pegahmdp
Pegah Mohammadipour
13 hours
🚀 Now in PRA! Lindblad dynamics normally demand huge depth. We run multiple shallow Lindblad simulations (Kraus form or dilated-Hamiltonians) and extrapolate. Yields polylog depth, Gevrey smoothness, and rigorous bias–variance guarantees. https://t.co/IudeayTt4W
0
1
2
@HamedMahdavi93
Hamed Mahdavi
2 days
What are the best data curation and synthetic data works you’ve seen this year at NeurIPS? Share it with me.
0
0
4
@HamedMahdavi93
Hamed Mahdavi
2 days
We arrived just 20 minutes before they closed the gates for our connecting flight.
0
0
1
@behrouz_ali
Ali Behrouz
2 days
We keep scaling model parameters by increasing width and stacking more layers, but what if the truly missing axes for continual learning are compression and stacking the learning process? Excited to share the full version of Nested Learning, a new paradigm for continual learning
28
150
965
@niloofar_mire
Niloofar
2 days
Had a blast talking about privacy and agentic AI at the @farairesearch alignment workshop! 1. Stop worrying about memorization as a privacy concern 2. Optimizing for math and coding tasks is NOT going to give us models that are better for *humans*! (See graph!) Slides ⬇️
2
11
212
@HamedMahdavi93
Hamed Mahdavi
2 days
Don't do best of n, do majority of bests! Follow this nice work by @AminRakhsha, @AmirKhasahmadi and @SoloGen.
@AminRakhsha
Amin Rakhsha
3 days
We are presenting our paper on test-time compute at #NeurIPS2025 🤔Running Best-of-N 1000 times and picking the most frequent answer works better than a single BoN. We make it cheap✨ Don't generate new outputs for each run. Sample with replacement from the existing ones! 🧵
0
1
12
@HamedMahdavi93
Hamed Mahdavi
2 days
It’s snowing on the East Coast right now, but I’ll be in San Diego soon for NeurIPS! I work on reasoning, synthetic data, and agentic workflows for reasoning. I’m open to jobs, internships, and collaborations. Always happy to chat, whether in person or via DM😎
0
0
6
@AminRakhsha
Amin Rakhsha
3 days
We are presenting our paper on test-time compute at #NeurIPS2025 🤔Running Best-of-N 1000 times and picking the most frequent answer works better than a single BoN. We make it cheap✨ Don't generate new outputs for each run. Sample with replacement from the existing ones! 🧵
1
10
22
@AmirKhasahmadi
Amir Khasahmadi
3 days
Our new paper on LLMs test-time computation! #Neurips2025 Majority-of-the-Bests (MoB) improves Best-of-N with negligible cpu cost. Check it out!
@AminRakhsha
Amin Rakhsha
3 days
We are presenting our paper on test-time compute at #NeurIPS2025 🤔Running Best-of-N 1000 times and picking the most frequent answer works better than a single BoN. We make it cheap✨ Don't generate new outputs for each run. Sample with replacement from the existing ones! 🧵
0
3
10
@aviral_kumar2
Aviral Kumar
8 days
🚨🚨New blog post led by CMU students: Want to know why LLM RL training plateaus on hard problems & scaling compute may not help? And how to fix this issue? Turns out it stems from a coupling of poor exploration & optimization. Classical ways to explore don't work, but ours
6
44
248
@HannaHajishirzi
Hanna Hajishirzi
16 days
Meet DRTulu: our open deep-research agent built for long-form, open-ended deep research tasks, trained with our new RLER method. DR Tulu rivals or is even better than proprietary deep research systems like Perplexity or OpenAI on several benchmarks.
@allen_ai
Ai2
16 days
Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚
2
8
38
@pushmeet
Pushmeet Kohli
22 days
Last year, AlphaProof & AlphaGeometry reached a key landmark in AI by achieving silver medal level performance at the International Math Olympiad. Today, @Nature is publishing the methodology behind our amazing agent AlphaProof! @GoogleDeepMind Paper:
nature.com
Nature - Olympiad-level formal mathematical reasoning with reinforcement learning
8
84
439
@aarashfeizi
Aarash Feizi
22 days
🚀 Announcing GroundCUA, a high-quality dataset for grounding computer-use agents. With over 3M expert annotations spanning 87 desktop apps, we use our new dataset to train state-of-the-art grounding models, namely GroundNext-3B and GroundNext-7B. 👇 Thread
5
31
81
@a_kazemnejad
Amirhossein Kazemnejad
22 days
Computer-use agents don’t touch the UI anymore; they do the high-level planning and call a "grounding" agent to click and type. @aarashfeizi et al. proposed a recipe to create to SOTA grounding agents: from the data-collection to RL pipeline design. Check it out.
@aarashfeizi
Aarash Feizi
22 days
🚀 Announcing GroundCUA, a high-quality dataset for grounding computer-use agents. With over 3M expert annotations spanning 87 desktop apps, we use our new dataset to train state-of-the-art grounding models, namely GroundNext-3B and GroundNext-7B. 👇 Thread
0
1
17
@HamedMahdavi93
Hamed Mahdavi
22 days
LLM-generated reviews be like: "The paper presents a self-driving car. However, a key limitation is that it does not fly."
0
0
8
@niloofar_mire
Niloofar
23 days
I'm really excited about our new paper!! 📣 'Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs' Contrary to belief that RL ft degrades memorized knowledge, RL-enhanced models consistently outperform base/SFT on knowledge recall by 24pp! RL teaches
18
50
421
@HamedMahdavi93
Hamed Mahdavi
25 days
1
2
23
@HamedMahdavi93
Hamed Mahdavi
25 days
This @aviral_kumar2 lecture is AMAZING.
1
29
290
@code_star
Cody Blakeney
26 days
She pivoted to midtraining research
@AutismCapital
Autism Capital 🧩
26 days
Whatever happened to Marie Kondo? Is she still sparking joy?
0
4
52
@pouria_mahdavi
Pouria Mahdavinia
27 days
Btw, I got this idea from James Martens' work, and I suggest reading his work for understanding how working optimizers for deep neural networks are developed: https://t.co/Oo4vZLY7cq. James Martens has also been a contributor to probably one of the first large NNs ever trained at
0
1
2