Azalia Mirhoseini Profile
Azalia Mirhoseini

@Azaliamirh

Followers
14K
Following
2K
Media
35
Statuses
350

Asst. Prof. of CS at Stanford, Google DeepMind. Prev: Anthropic, Google Brain. Co-Creator of MoEs, AlphaChip, Test Time Scaling Laws.

Stanford, CA
Joined May 2013
Don't wanna be here? Send us removal request.
@Azaliamirh
Azalia Mirhoseini
3 months
Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use!. With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive
Tweet media one
5
74
395
@Azaliamirh
Azalia Mirhoseini
3 days
So happy to see the strong interest in KernelBench, our AI for AI acceleration benchmark! . The team has released some updates today:.
@anneouyang
Anne Ouyang
3 days
KernelBench v0.1 is out, featuring:.- A guideline on analyzing the validity of results and ruling out physically impossible performance claims. - Support for randomized testing beyond normal distributions. - Fixed problem sizes and improved numerics
Tweet media one
1
6
54
@Azaliamirh
Azalia Mirhoseini
13 days
RT @lmthang: Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the Inte….
0
229
0
@Azaliamirh
Azalia Mirhoseini
13 days
RT @quocleix: Excited to share that a scaled up version of Gemini DeepThink achieves gold-medal standard at the International Mathematical….
Tweet card summary image
deepmind.google
Our advanced model officially achieved a gold-medal level performance on problems from the International Mathematical Olympiad (IMO), the world’s most prestigious competition for young...
0
50
0
@Azaliamirh
Azalia Mirhoseini
17 days
RT @RylanSchaeffer: If you want to learn about the power (laws) of large language monkeys (and get a free banana 🍌), come to our poster at….
0
6
0
@Azaliamirh
Azalia Mirhoseini
17 days
RT @willccbb: cant stop thinking about this one. insanely elegant, seems insanely powerful
Tweet media one
0
54
0
@Azaliamirh
Azalia Mirhoseini
18 days
RT @simonguozirui: At #ICML2025 in Vancouver 🇨🇦 this week, presenting some work from my first year at Stanford! Come find me at posters or….
0
14
0
@Azaliamirh
Azalia Mirhoseini
18 days
Looking forward to attending ICML!. Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!
Tweet media one
3
17
86
@Azaliamirh
Azalia Mirhoseini
1 month
RT @CaiaCostello: So excited to speak tomorrow about Think Prune Train at LAD'25 session on Reasoning and Self Improvement!. .
Tweet card summary image
iclad.ai
0
1
0
@Azaliamirh
Azalia Mirhoseini
1 month
RT @oscrhong: Interesting tidbit from prof @chrmanning: The first mention of “Large Language Model” comes from a 1998 NLP workshop Taiwan!….
0
6
0
@Azaliamirh
Azalia Mirhoseini
1 month
RT @iScienceLuvr: Shrinking the Generation-Verification Gap with Weak Verifiers. "we introduce Weaver, a framework for designing a strong v….
0
27
0
@Azaliamirh
Azalia Mirhoseini
1 month
RT @ajratner: Very exciting work on using weak supervision for RL- closing the “generation-verification gap”!! Once again- principled appr….
0
7
0
@Azaliamirh
Azalia Mirhoseini
1 month
RT @Azaliamirh: See @JonSaadFalcon's post for more details: Paper: .Blog: .
Tweet card summary image
huggingface.co
0
3
0
@Azaliamirh
Azalia Mirhoseini
1 month
See @JonSaadFalcon's post for more details: Paper: .Blog: Datasets and Models:
Tweet card summary image
huggingface.co
@JonSaadFalcon
Jon Saad-Falcon
1 month
How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? .🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning
Tweet media one
0
3
13
@Azaliamirh
Azalia Mirhoseini
1 month
Introducing Weaver, a test time scaling method for verification! . Weaver shrinks the generation-verification gap through a low-overhead weak-to-strong optimization of a mixture of verifiers (e.g., LM judges and reward models). The Weavered mixture can be distilled into a tiny
Tweet media one
3
50
225
@Azaliamirh
Azalia Mirhoseini
2 months
Congratulations, @CaiaCostello.and Adrian!.
@simonguozirui
Simon Guo
2 months
So proud of @CaiaCostello who graduated with her CS masters from @stanfordeng 🎓 today! Lucky to have helped her with the TPT project along with @annadgoldie and @Azaliamirh. This is from her presenting the TPT poster at ICLR 🇸🇬workshop!
Tweet media one
2
1
16
@Azaliamirh
Azalia Mirhoseini
2 months
Congratulations, Dr. Goldie! @annadgoldie.
@chrmanning
Christopher Manning
2 months
Huge congratulations to @annadgoldie on receiving her @Stanford PhD today! It’s been a great journey!
Tweet media one
1
1
61
@Azaliamirh
Azalia Mirhoseini
2 months
RT @soumithchintala: This is a proper Vibe-coding setup for GPU programmers, and can result in getting surprisingly far!. I honestly think….
0
36
0
@Azaliamirh
Azalia Mirhoseini
2 months
Go, @realSharonZhou and team! Congrats to @LisaSu and AMD on such an amazing addition!.
@LisaSu
Lisa Su
2 months
Welcome aboard @realSharonZhou! So happy to have you and the team joining us as we bring @AIatAMD to the world!!!.
1
0
18
@Azaliamirh
Azalia Mirhoseini
2 months
RT @teortaxesTex: I like this idea very much and have long advocated for something like this. Synthetically enriched «KV prefix» is a natur….
0
17
0