michaelryan207 Profile Banner
Michael Ryan Profile
Michael Ryan

@michaelryan207

Followers
2K
Following
7K
Media
34
Statuses
284

PhD Student @stanfordnlp || Working on DSPy 🧩 || Prev @GeorgiaTech @Microsoft @SnowflakeDB

Palo Alto, CA
Joined December 2019
Don't wanna be here? Send us removal request.
@michaelryan207
Michael Ryan
5 months
New #ACL2025NLP Paper! 🎉 Curious what AI thinks about YOU? We interact with AI every day, offering all kinds of feedback, both implicit ✏️ and explicit 👍.  What if we used this feedback to personalize your AI assistant to you? Introducing SynthesizeMe! An approach for
7
43
147
@jyangballin
John Yang
10 days
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
29
91
368
@bhatia_mehar
Mehar Bhatia
11 days
🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵
1
45
119
@cjziems
Caleb Ziems
10 days
Can we map out gaps in LLMs’ cultural knowledge? Check out our #EMNLP2025 talk: Culture Cartography 🗓️ 11/5, 11:30 AM 📌 A109 (CSS Orals 1) Compared to trad. benchmarking, our mixed-initiative method finds more gaps even in reasoning models like R1! 📄 https://t.co/6RtZCuskl1
1
28
107
@Diyi_Yang
Diyi Yang
18 days
@ZhiruoW 's research compares AI agents vs humans across real work tasks (data analysis, engineering, design, writing). Key findings: 👉Agents are 88% faster & 90-96% cheaper 👉BUT produce lower quality work, often fabricate data to mask limitations 👉Agents code everything,
@ZhiruoW
Zora Wang
18 days
Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine
1
15
107
@oshaikh13
Omar Shaikh
23 days
STanFoRd cLasSes aRE oUtDaTeD 🤡
@Diyi_Yang
Diyi Yang
24 days
Thanks @thinkymachines for supporting Tinker access for our CS329x students on Homework 2 😉
3
9
375
@michaelryan207
Michael Ryan
1 month
Really cool blog post! Ever suffer from “context rot” when your trajectory/context gets excessively long? Your LLM can already process long contexts well, just put the context in a REPL environment and let the LLM call itself 🔥 Let the LLM decide what context it needs!
@a1zhang
Alex L Zhang
1 month
What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,
2
5
69
@michaelryan207
Michael Ryan
2 months
iCLEAR your calendar! ICLR abstract deadline is coming up 👀
@iclr_conf
ICLR 2026
2 months
The Abstract deadline is soon: 11:59pm, Sep 19 (Anywhere on Earth). Don’t wait until the last minute 😉 Best of luck to everyone submitting!
0
1
24
@saurabh_shah2
Saurabh Shah
2 months
If you visit me in Seattle like @aryaman2020 and @michaelryan207 I will show you what life’s all about Or at least take you to Ai2 office where the snacks are pretty good
4
2
38
@heinemandavidj
David Heineman
3 months
Evaluating language models is tricky, how do we know if our results are real, or due to random chance? We find an answer with two simple metrics: signal, a benchmark’s ability to separate models, and noise, a benchmark’s random variability between training steps 🧵
@allen_ai
Ai2
3 months
📢 New paper from Ai2: Signal & Noise asks a simple question—can language model benchmarks detect a true difference in model performance? 🧵
4
53
236
@michaelryan207
Michael Ryan
3 months
If you liked MIPRO you’re REALLY going to like GEPA 🔥🧩 Try it out now!
@LakshyAAAgrawal
Lakshya A Agrawal
3 months
Very excited to share that GEPA is now live on @DSPyOSS as dspy.GEPA! This is an early code release. We’re looking forward to community feedback, especially about any practical challenges in switching optimizers.
2
6
42
@michaelryan207
Michael Ryan
3 months
Really cool example of how iterative/reflective prompt optimization can lead to discovery of novel and highly effective strategies!
@StevenyzZhang
Yanzhe Zhang
3 months
Specifically: Attacks evolve from simple direct requests → sophisticated multi-turn tactics like impersonation & consent forgery. Defenses progress from rule-based constraints → state-machine identity verification. Paper: https://t.co/hCJyRBYqJP Code: https://t.co/JoLxuJmXgR
3
8
32
@oshaikh13
Omar Shaikh
3 months
If you thought referencing past chats was cool, we built an MCP that lets Claude use *anything you see or do on your computer* as context.
@claudeai
Claude
3 months
Claude can now reference past chats, so you can easily pick up from where you left off.
20
32
194
@heylegacyguy
Legacy Guy
3 months
Getting my head around DSPy's advanced optimizers was a little tricky for me. There are many steps and parameters controlling the process. So I dug deep into MIPROv2 and wrote a guide how to configure it in detail! While I was at it I built a website to host notes like
1
33
228
@lateinteraction
Omar Khattab
4 months
As we move into the era of GEPA & SIMBA in @DSPyOSS optimization (and others very soon 👀)… It is only appropriate that you marvel at the beauty of the past generation of prompt optimizers, which—unlike reflective optimizers—worked even for dumb old LMs, even 1B Llama! This
@michaelryan207
Michael Ryan
1 year
MIPROv2, our new state-of-the-art optimizer for LM programs, is live in DSPy @stanfordnlp! It's even faster, cheaper, and more accurate than MIPRO. MIPROv2 proposes instructions, bootstraps demonstrations, and optimizes combinations. Let’s dive into a visual 🧵of how it works!
6
25
174
@michaelryan207
Michael Ryan
4 months
Interested in converting your text LLM to a speech LLM with no instruction tuning data? 🔊 Built a speech model but not sure how to evaluate it? 🗣️ Come to @WilliamBarrHeld’s BACK TO BACK oral presentations in 1.61 starting in ~15 minutes! #ACL2025
@Diyi_Yang
Diyi Yang
1 year
We're very excited to release 🌟DiVA — Distilled Voice Assistant 🔊 @WilliamBarrHeld ✅End-to-end differentiable speech LM; early fusion with Whisper and Llama 3 8B ✅Improves generalization by using distillation rather than supervised loss ✅Trained only using open-access
0
8
15
@michaelryan207
Michael Ryan
4 months
For anyone who can't make it here's the full poster.
0
1
3
@michaelryan207
Michael Ryan
4 months
Presenting this today! 🎉 #ACL2025NLP Come by Poster Session 2 in Hall X4/X5 @ 10:30-12pm today to chat with me about how much we can learn about users from their prior pairwise interactions! 🤔
@michaelryan207
Michael Ryan
5 months
New #ACL2025NLP Paper! 🎉 Curious what AI thinks about YOU? We interact with AI every day, offering all kinds of feedback, both implicit ✏️ and explicit 👍.  What if we used this feedback to personalize your AI assistant to you? Introducing SynthesizeMe! An approach for
2
12
75
@LakshyAAAgrawal
Lakshya A Agrawal
4 months
How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
46
167
1K
@Diyi_Yang
Diyi Yang
4 months
We’re at #ACL2025 in Vienna!!! @WilliamBarrHeld @michaelryan207 @dorazhao9 @oshaikh13 Catch us at our poster/talk and let’s chat 🔥😀
2
18
106
@michaelryan207
Michael Ryan
4 months
I’m in Vienna for #ACL2025NLP! DMs are open - let’s chat about LLM Personalization, Evals/Metrics, AI+Culture, DSPy 🧩, and the best place to fill up this new coffee cup in the city! ☕️
1
11
84