Michael Ryan @michaelryan207 X Profile

Michael Ryan

@michaelryan207

Followers

2K

Following

7K

Media

34

Statuses

284

PhD Student @stanfordnlp || Working on DSPy 🧩 || Prev @GeorgiaTech @Microsoft @SnowflakeDB

https://t.co/q1giTsxLec

Palo Alto, CA

Joined December 2019

Don't wanna be here? Send us removal request.

Michael Ryan

@michaelryan207

5 months

New #ACL2025NLP Paper! 🎉 Curious what AI thinks about YOU? We interact with AI every day, offering all kinds of feedback, both implicit ✏️ and explicit 👍. What if we used this feedback to personalize your AI assistant to you? Introducing SynthesizeMe! An approach for

7

43

147

John Yang

@jyangballin

10 days

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

29

91

368

Mehar Bhatia

@bhatia_mehar

11 days

🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵

1

45

119

Caleb Ziems

@cjziems

10 days

Can we map out gaps in LLMs’ cultural knowledge? Check out our #EMNLP2025 talk: Culture Cartography 🗓️ 11/5, 11:30 AM 📌 A109 (CSS Orals 1) Compared to trad. benchmarking, our mixed-initiative method finds more gaps even in reasoning models like R1! 📄 https://t.co/6RtZCuskl1

1

28

107

Diyi Yang

@Diyi_Yang

18 days

@ZhiruoW 's research compares AI agents vs humans across real work tasks (data analysis, engineering, design, writing). Key findings: 👉Agents are 88% faster & 90-96% cheaper 👉BUT produce lower quality work, often fabricate data to mask limitations 👉Agents code everything,

Zora Wang

@ZhiruoW

18 days

Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine

1

15

107

Omar Shaikh

@oshaikh13

23 days

STanFoRd cLasSes aRE oUtDaTeD 🤡

Diyi Yang

@Diyi_Yang

24 days

Thanks @thinkymachines for supporting Tinker access for our CS329x students on Homework 2 😉

3

9

375

Michael Ryan

@michaelryan207

1 month

Really cool blog post! Ever suffer from “context rot” when your trajectory/context gets excessively long? Your LLM can already process long contexts well, just put the context in a REPL environment and let the LLM call itself 🔥 Let the LLM decide what context it needs!

Alex L Zhang

@a1zhang

1 month

What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,

2

5

69

Michael Ryan

@michaelryan207

2 months

iCLEAR your calendar! ICLR abstract deadline is coming up 👀

ICLR 2026

@iclr_conf

2 months

The Abstract deadline is soon: 11:59pm, Sep 19 (Anywhere on Earth). Don’t wait until the last minute 😉 Best of luck to everyone submitting!

0

1

24

Saurabh Shah

@saurabh_shah2

2 months

If you visit me in Seattle like @aryaman2020 and @michaelryan207 I will show you what life’s all about Or at least take you to Ai2 office where the snacks are pretty good

4

2

38

David Heineman

@heinemandavidj

3 months

Evaluating language models is tricky, how do we know if our results are real, or due to random chance? We find an answer with two simple metrics: signal, a benchmark’s ability to separate models, and noise, a benchmark’s random variability between training steps 🧵

Ai2

@allen_ai

3 months

📢 New paper from Ai2: Signal & Noise asks a simple question—can language model benchmarks detect a true difference in model performance? 🧵

4

53

236

Michael Ryan

@michaelryan207

3 months

If you liked MIPRO you’re REALLY going to like GEPA 🔥🧩 Try it out now!

Lakshya A Agrawal

@LakshyAAAgrawal

3 months

Very excited to share that GEPA is now live on @DSPyOSS as dspy.GEPA! This is an early code release. We’re looking forward to community feedback, especially about any practical challenges in switching optimizers.

2

6

42

Michael Ryan

@michaelryan207

3 months

Really cool example of how iterative/reflective prompt optimization can lead to discovery of novel and highly effective strategies!

Yanzhe Zhang

@StevenyzZhang

3 months

Specifically: Attacks evolve from simple direct requests → sophisticated multi-turn tactics like impersonation & consent forgery. Defenses progress from rule-based constraints → state-machine identity verification. Paper: https://t.co/hCJyRBYqJP Code: https://t.co/JoLxuJmXgR

3

8

32

Omar Shaikh

@oshaikh13

3 months

If you thought referencing past chats was cool, we built an MCP that lets Claude use *anything you see or do on your computer* as context.

Claude

@claudeai

3 months

Claude can now reference past chats, so you can easily pick up from where you left off.

20

32

194

Legacy Guy

@heylegacyguy

3 months

Getting my head around DSPy's advanced optimizers was a little tricky for me. There are many steps and parameters controlling the process. So I dug deep into MIPROv2 and wrote a guide how to configure it in detail! While I was at it I built a website to host notes like

1

33

228

Omar Khattab

@lateinteraction

4 months

As we move into the era of GEPA & SIMBA in @DSPyOSS optimization (and others very soon 👀)… It is only appropriate that you marvel at the beauty of the past generation of prompt optimizers, which—unlike reflective optimizers—worked even for dumb old LMs, even 1B Llama! This

Michael Ryan

@michaelryan207

1 year

MIPROv2, our new state-of-the-art optimizer for LM programs, is live in DSPy @stanfordnlp! It's even faster, cheaper, and more accurate than MIPRO. MIPROv2 proposes instructions, bootstraps demonstrations, and optimizes combinations. Let’s dive into a visual 🧵of how it works!

6

25

174

Michael Ryan

@michaelryan207

4 months

Interested in converting your text LLM to a speech LLM with no instruction tuning data? 🔊 Built a speech model but not sure how to evaluate it? 🗣️ Come to @WilliamBarrHeld’s BACK TO BACK oral presentations in 1.61 starting in ~15 minutes! #ACL2025

Diyi Yang

@Diyi_Yang

1 year

We're very excited to release 🌟DiVA — Distilled Voice Assistant 🔊 @WilliamBarrHeld ✅End-to-end differentiable speech LM; early fusion with Whisper and Llama 3 8B ✅Improves generalization by using distillation rather than supervised loss ✅Trained only using open-access

0

8

15

Michael Ryan

@michaelryan207

4 months

For anyone who can't make it here's the full poster.

0

1

3

Michael Ryan

@michaelryan207

4 months

Presenting this today! 🎉 #ACL2025NLP Come by Poster Session 2 in Hall X4/X5 @ 10:30-12pm today to chat with me about how much we can learn about users from their prior pairwise interactions! 🤔

Michael Ryan

@michaelryan207

5 months

New #ACL2025NLP Paper! 🎉 Curious what AI thinks about YOU? We interact with AI every day, offering all kinds of feedback, both implicit ✏️ and explicit 👍. What if we used this feedback to personalize your AI assistant to you? Introducing SynthesizeMe! An approach for

2

12

75

Lakshya A Agrawal

@LakshyAAAgrawal

4 months

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

46

167

1K

Diyi Yang

@Diyi_Yang

4 months

We’re at #ACL2025 in Vienna!!! @WilliamBarrHeld @michaelryan207 @dorazhao9 @oshaikh13 Catch us at our poster/talk and let’s chat 🔥😀

2

18

106

Michael Ryan

@michaelryan207

4 months

I’m in Vienna for #ACL2025NLP! DMs are open - let’s chat about LLM Personalization, Evals/Metrics, AI+Culture, DSPy 🧩, and the best place to fill up this new coffee cup in the city! ☕️

1

11

84