Michael Ryan
@michaelryan207
Followers
2K
Following
7K
Media
34
Statuses
284
PhD Student @stanfordnlp || Working on DSPy 🧩 || Prev @GeorgiaTech @Microsoft @SnowflakeDB
Palo Alto, CA
Joined December 2019
New #ACL2025NLP Paper! 🎉 Curious what AI thinks about YOU? We interact with AI every day, offering all kinds of feedback, both implicit ✏️ and explicit 👍. What if we used this feedback to personalize your AI assistant to you? Introducing SynthesizeMe! An approach for
7
43
147
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
29
91
368
🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵
1
45
119
Can we map out gaps in LLMs’ cultural knowledge? Check out our #EMNLP2025 talk: Culture Cartography 🗓️ 11/5, 11:30 AM 📌 A109 (CSS Orals 1) Compared to trad. benchmarking, our mixed-initiative method finds more gaps even in reasoning models like R1! 📄 https://t.co/6RtZCuskl1
1
28
107
@ZhiruoW 's research compares AI agents vs humans across real work tasks (data analysis, engineering, design, writing). Key findings: 👉Agents are 88% faster & 90-96% cheaper 👉BUT produce lower quality work, often fabricate data to mask limitations 👉Agents code everything,
Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine
1
15
107
STanFoRd cLasSes aRE oUtDaTeD 🤡
3
9
375
Really cool blog post! Ever suffer from “context rot” when your trajectory/context gets excessively long? Your LLM can already process long contexts well, just put the context in a REPL environment and let the LLM call itself 🔥 Let the LLM decide what context it needs!
What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,
2
5
69
If you visit me in Seattle like @aryaman2020 and @michaelryan207 I will show you what life’s all about Or at least take you to Ai2 office where the snacks are pretty good
4
2
38
Evaluating language models is tricky, how do we know if our results are real, or due to random chance? We find an answer with two simple metrics: signal, a benchmark’s ability to separate models, and noise, a benchmark’s random variability between training steps 🧵
📢 New paper from Ai2: Signal & Noise asks a simple question—can language model benchmarks detect a true difference in model performance? 🧵
4
53
236
If you liked MIPRO you’re REALLY going to like GEPA 🔥🧩 Try it out now!
Very excited to share that GEPA is now live on @DSPyOSS as dspy.GEPA! This is an early code release. We’re looking forward to community feedback, especially about any practical challenges in switching optimizers.
2
6
42
Really cool example of how iterative/reflective prompt optimization can lead to discovery of novel and highly effective strategies!
Specifically: Attacks evolve from simple direct requests → sophisticated multi-turn tactics like impersonation & consent forgery. Defenses progress from rule-based constraints → state-machine identity verification. Paper: https://t.co/hCJyRBYqJP Code: https://t.co/JoLxuJmXgR
3
8
32
Getting my head around DSPy's advanced optimizers was a little tricky for me. There are many steps and parameters controlling the process. So I dug deep into MIPROv2 and wrote a guide how to configure it in detail! While I was at it I built a website to host notes like
1
33
228
As we move into the era of GEPA & SIMBA in @DSPyOSS optimization (and others very soon 👀)… It is only appropriate that you marvel at the beauty of the past generation of prompt optimizers, which—unlike reflective optimizers—worked even for dumb old LMs, even 1B Llama! This
MIPROv2, our new state-of-the-art optimizer for LM programs, is live in DSPy @stanfordnlp! It's even faster, cheaper, and more accurate than MIPRO. MIPROv2 proposes instructions, bootstraps demonstrations, and optimizes combinations. Let’s dive into a visual 🧵of how it works!
6
25
174
Interested in converting your text LLM to a speech LLM with no instruction tuning data? 🔊 Built a speech model but not sure how to evaluate it? 🗣️ Come to @WilliamBarrHeld’s BACK TO BACK oral presentations in 1.61 starting in ~15 minutes! #ACL2025
We're very excited to release 🌟DiVA — Distilled Voice Assistant 🔊 @WilliamBarrHeld ✅End-to-end differentiable speech LM; early fusion with Whisper and Llama 3 8B ✅Improves generalization by using distillation rather than supervised loss ✅Trained only using open-access
0
8
15
Presenting this today! 🎉 #ACL2025NLP Come by Poster Session 2 in Hall X4/X5 @ 10:30-12pm today to chat with me about how much we can learn about users from their prior pairwise interactions! 🤔
New #ACL2025NLP Paper! 🎉 Curious what AI thinks about YOU? We interact with AI every day, offering all kinds of feedback, both implicit ✏️ and explicit 👍. What if we used this feedback to personalize your AI assistant to you? Introducing SynthesizeMe! An approach for
2
12
75
How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
46
167
1K
We’re at #ACL2025 in Vienna!!! @WilliamBarrHeld @michaelryan207 @dorazhao9 @oshaikh13 Catch us at our poster/talk and let’s chat 🔥😀
2
18
106
I’m in Vienna for #ACL2025NLP! DMs are open - let’s chat about LLM Personalization, Evals/Metrics, AI+Culture, DSPy 🧩, and the best place to fill up this new coffee cup in the city! ☕️
1
11
84