Omar Shaikh
@oshaikh13
Followers
2K
Following
7K
Media
36
Statuses
705
member of sociotechnical staff @Stanford
🇸🇦→🇨🇦→🇺🇸→🇸🇦→🇺🇸
Joined December 2012
What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵
18
95
369
Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):
29
94
297
LLMs are notorious for "hallucinating": producing confident-sounding answers that are entirely wrong. But with the right definitions, we can extract a semantic notion of "confidence" from LLMs, and this confidence turns out to be calibrated out-of-the-box in many settings (!)
21
79
573
✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n
29
121
644
It's challenging to maintain data quality while preserving variation in data labels! We find that spam filtering for data annotation removes annotators who disagree instead of actual spammers, distorting data label distributions. 📄 https://t.co/ccwyvArvqV
1
4
31
No better time to learn about that #AI thing everyone's talking about... 📢 I'm recruiting PhD students in Computer Science or Information Science @Cornell_Bowers! If you're interested, apply to either department (yes, either program!) and list me as a potential advisor!
18
105
532
New EMNLP main paper: “Finetuning LLMs for Human Behavior Prediction in Social Science Experiments” We built SocSci210—2.9M human responses from 210 social science experiments. Finetuning Qwen2.5-14B on SocSci210 beats its base model by 26% & GPT-4o by 13% on unseen studies.🧵
2
8
29
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
28
91
367
Can we map out gaps in LLMs’ cultural knowledge? Check out our #EMNLP2025 talk: Culture Cartography 🗓️ 11/5, 11:30 AM 📌 A109 (CSS Orals 1) Compared to trad. benchmarking, our mixed-initiative method finds more gaps even in reasoning models like R1! 📄 https://t.co/6RtZCuskl1
1
28
107
We recorded a bunch of people actually working on their computers (!) and then compared agent performance to actual human workflows. Awesome paper led by @ZhiruoW :)
Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine
3
6
45
🔎Did someone steal your language model? We can tell you, as long as you shuffled your training data🔀. All we need is some text from their model! Concretely, suppose Alice trains an open-weight model and Bob uses it to produce text. Can Alice prove Bob used her model?🚨
34
93
757
STanFoRd cLasSes aRE oUtDaTeD 🤡
3
9
374
AI is already at work in American newsrooms. We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea. Here's what we learned about how AI is influencing local and national journalism:
4
53
144
Our new preprint on the ways that LLM r&d overlook the needs of broader populations!! If LLMs are to be widely adopted, we need to move beyond "me-search". So excited that this work is finally out, been mentioning it in so many conversations :)
As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. https://t.co/YprwsthysY
1
2
23
As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. https://t.co/YprwsthysY
1
36
81
🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full
52
299
2K
🚨New paper on AI and copyright Several authors have sued LLM companies for allegedly using their books without permission for model training. 👩⚖️Courts, however, require empirical evidence of harm (e.g., market dilution). Our new pre-registered study addresses exactly this
9
173
526
🚨New paper alert! 🚨 Tandem Training for Language Models https://t.co/Emzcgf1KHx Actions & thoughts of AI w/ superhuman skills will be hard for humans to follow, undermining human oversight of AI. We propose a new way to make AI produce human-understandable solutions. How?👉🧵
4
23
67
Inspired by @karpathy’s NanoChat, I created a simple experiment for running a full pre-mid-post-training run using Marin! The whole thing runs on a single v5p-8, so researchers can reproduce the whole thing at extremely low cost on the TPU Research Cloud!
1
5
17
🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵
5
48
194
Fun fact, @Diyi_Yang has great taste in fine dining at conferences as well 🍱🧑🍳#COLM2025 #professorgossip
📅 Just 4 days until LM4Sci #COLM2025! 🤖🤝🔬 🔥 The countdown continues! Today's spotlight: Diyi Yang (Stanford) @Diyi_Yang, on a Human-Centered Perspective on Automating Research 🧵
2
9
50