Omar Khattab
@lateinteraction
Followers
26K
Following
26K
Media
457
Statuses
11K
Asst professor @MIT EECS & CSAIL (@nlp_mit). Author of https://t.co/VgyLxl0oa1 and https://t.co/ZZaSzaRaZ7 (@DSPyOSS). Prev: CS PhD @StanfordNLP. Research @Databricks.
Cambridge, MA
Joined December 2022
we’re at the stage where some spammers are so stupid you can tell they’re not AI
0
0
5
This kind of analogy presumes that being the surgeon assistant is easier or otherwise more appropriate for AI than just being the surgeon. Not that it isn’t true, but how do you know that?
Geoffrey drops a new analogy for working with AI that I really like; you're the surgeon, the AI tools are your team of surgical assistants
3
0
3
What does it actually take to give an LLM memory? @neural_avb explored that question by recreating the architecture described in the Mem0 paper using DSPy, showing how extraction, indexing, retrieval, and updates come together inside an agentic memory system. The video distills
1
1
18
correct, i’m in the dspy shitposts category now
Twitter Growth Strategy 0-500 Followers: dspy reply guy 501-2K: niche dspy bangers 2-5K: dspy thirst traps 5-10K: dspy news 10-25K: dspy thread 25-50K: dspy shitposts 50-75K: dspy fortune cookies 75-100K: dspy bangers >100K: Get Cancelled by Big Eval
2
0
15
Today, we’re overjoyed to have a 25th Anniversary Reunion of @stanfordnlp. So happy to see so many of our former students back at @Stanford. And thanks to @StanfordHAI for the venue!
3
15
91
If your crypto wallet’s touching seven figures, you’ve officially outgrown DIY management. It’s time for structures, protection and strategy. Work with professionals who understand digital wealth.
1
14
67
Personally, @chrmanning is such an inspiration and someone I have unfalteringly admired in the past 15 years or so of working in NLP. Imagine producing phd students who have, in their own right, become stars, repeatedly producing test-of-time science, being responsible for
4
10
60
New blog post! Simon has done (and continues to do) really foundational work for GPU codegen, and he has a lot of really important insight to share from his own work. There’s lots of perspective in this post that we learned this past year, so go read it!
Wrote a 1-year retrospective with @a1zhang on KernelBench and the journey toward automated GPU/CUDA kernel generations! Since my labmates (@anneouyang, @simran_s_arora, @_williamhu) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have
1
5
60
Wrote a 1-year retrospective with @a1zhang on KernelBench and the journey toward automated GPU/CUDA kernel generations! Since my labmates (@anneouyang, @simran_s_arora, @_williamhu) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have
10
55
236
@hi_ZachParent has kindly open-sourced his amazing work on using GEPA for monitoring AI-generated code safety. Checkout the fully executable tutorial notebook! https://t.co/uMM4cBIT5C
Super interesting usecase with GEPA: @HopmanMia and @ParentZap find that GEPA discovers highly effective prompts for detecting malicious behavior in AI-generated code, blocking 90% malicious code submissions, at just 1% of the audit budget! https://t.co/FA31WYgPul
1
7
68
Oct. 20-26 is Free Speech Week. Your right to speak freely this week – and every single day of the year – is brought to you by the First Amendment.
10
59
270
this updated my prior
Why override µP? Because its core assumptions only hold very early in training! In practice wide models quickly stop being more sensitive to weight updates than smaller models! This is caused by changes in the geometric alignment of updates and layer inputs over training. 🧵6/8
2
12
84
Trying to build good docs for DSRs(@DSPyOSS in Rust) that could bridge to understanding DSPy conceptually as well. Looking for collaborators who can drive the initiative! DM if interested! P.S. posting DSRs new release/updates on Monday!
1
3
11
🎉 Milestone moment — MedSAM is now the most cited paper among my 100+ publications! Huge thanks to @JunMa_11 and our amazing collaborators for making this possible. It’s incredible to see how far the idea of “Segment Anything in Medical Images” has come — from concept to
2
14
101
Key figure from our new paper: coverage is more predictive than KL of what model will succeed in best-of-N. Read more in Dylan's thread and at
arxiv.org
Language models demonstrate remarkable abilities when pre-trained on large text corpora and fine-tuned for specific tasks, but how and why pre-training shapes the success of the final model...
@auddery @GolowichNoah @SadhikaMalladi @jordan_t_ash (7/12) Example (see figure): - Cross-entropy decreases throughout training. - Coverage improves to a point, but begins to drop as the model learns a spurious shortcut. - BoN performance follows trend of coverage, not CE (increasing initially, dropping as shortcut is learned).
0
6
23
The coverage principle: How pre-training enables post-training New preprint where we look at the mechanisms through which next-token prediction produces models that succeed at downstream tasks. The answer involves a metric we call the "coverage profile", not cross-entropy.
7
37
255
Celebrate the materials that build America. Join us this ROCKtober 2025!
28
15
77
DSPyWeekly Issue #8 is packed! 🚀 Highlights: 🔹 Articles: Deep dive into DSPy optimizers, building an AI ghostwriter with "taste," & new papers on anomaly detection (SAVANT) and Meaning-Typed Programming (MTP). 🔹 Videos: Tutorials on building agentic memory (Mem0) w/ QDrant,
4
21
95
I would be suprised if prompt optimization (eg @DSPyOSS) doesn't SOTA existing interpretability evals. Willing to bet it's better than steering, which is the most causal eval I know of
1
2
6
Hot take: prompt optimization is the future of interpretability
7
2
25
New video on agentic memory systems is out currently out on my channel. This one discusses the challenges of long term memory as a context engineering problem, explains the Mem0 api, and the proceeds to code the core features of Mem0 from scratch. We use DSPy to extract
6
26
253
Hey algorithm, show this to people who value clean designs and see beauty in the simplest things 👀 I explored a new concept called PlantPal, a minimal plant care reminder app for people who love their greens but forget when to water them. Users can track watering schedules,
20
13
186
@isaacbmiller1 That's what we're doing with AskRally now. We're building a virtual panel of AI personas calibrated on a real person's response. One task model and one judge per person with GEPA.
0
4
8
I am incredibly bullish on running prompt optimization per user or per organization. Cheap enough to run quickly and frequently, and can preserve privacy when run locally.
5
4
31