Omar Shaikh Profile
Omar Shaikh

@oshaikh13

Followers
2K
Following
7K
Media
36
Statuses
705

member of sociotechnical staff @Stanford

🇸🇦→🇨🇦→🇺🇸→🇸🇦→🇺🇸
Joined December 2012
Don't wanna be here? Send us removal request.
@oshaikh13
Omar Shaikh
5 months
What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵
18
95
369
@JonSaadFalcon
Jon Saad-Falcon
18 hours
Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):
29
94
297
@PreetumNakkiran
Preetum Nakkiran
2 days
LLMs are notorious for "hallucinating": producing confident-sounding answers that are entirely wrong. But with the right definitions, we can extract a semantic notion of "confidence" from LLMs, and this confidence turns out to be calibrated out-of-the-box in many settings (!)
21
79
573
@AlexanderSpangh
Alexander Spangher
2 days
✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n
29
121
644
@enfleisig
Eve Fleisig @ EMNLP 2025
6 days
It's challenging to maintain data quality while preserving variation in data labels! We find that spam filtering for data annotation removes annotators who disagree instead of actual spammers, distorting data label distributions. 📄 https://t.co/ccwyvArvqV
1
4
31
@KaitlynZhou
Kaitlyn Zhou
7 days
No better time to learn about that #AI thing everyone's talking about... 📢 I'm recruiting PhD students in Computer Science or Information Science @Cornell_Bowers! If you're interested, apply to either department (yes, either program!) and list me as a potential advisor!
18
105
532
@KolluriAkaash
Akaash Kolluri
7 days
New EMNLP main paper: “Finetuning LLMs for Human Behavior Prediction in Social Science Experiments” We built SocSci210—2.9M human responses from 210 social science experiments. Finetuning Qwen2.5-14B on SocSci210 beats its base model by 26% & GPT-4o by 13% on unseen studies.🧵
2
8
29
@jyangballin
John Yang
8 days
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
28
91
367
@cjziems
Caleb Ziems
9 days
Can we map out gaps in LLMs’ cultural knowledge? Check out our #EMNLP2025 talk: Culture Cartography 🗓️ 11/5, 11:30 AM 📌 A109 (CSS Orals 1) Compared to trad. benchmarking, our mixed-initiative method finds more gaps even in reasoning models like R1! 📄 https://t.co/6RtZCuskl1
1
28
107
@oshaikh13
Omar Shaikh
16 days
We recorded a bunch of people actually working on their computers (!) and then compared agent performance to actual human workflows. Awesome paper led by @ZhiruoW :)
@ZhiruoW
Zora Wang
16 days
Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine
3
6
45
@SallyHZhu
Sally Zhu
21 days
🔎Did someone steal your language model? We can tell you, as long as you shuffled your training data🔀. All we need is some text from their model! Concretely, suppose Alice trains an open-weight model and Bob uses it to produce text. Can Alice prove Bob used her model?🚨
34
93
757
@oshaikh13
Omar Shaikh
22 days
STanFoRd cLasSes aRE oUtDaTeD 🤡
@Diyi_Yang
Diyi Yang
22 days
Thanks @thinkymachines for supporting Tinker access for our CS329x students on Homework 2 😉
3
9
374
@jennajrussell
Jenna Russell
22 days
AI is already at work in American newsrooms. We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea. Here's what we learned about how AI is influencing local and national journalism:
4
53
144
@chengmyra1
Myra Cheng
23 days
Our new preprint on the ways that LLM r&d overlook the needs of broader populations!! If LLMs are to be widely adopted, we need to move beyond "me-search". So excited that this work is finally out, been mentioning it in so many conversations :)
@KaitlynZhou
Kaitlyn Zhou
23 days
As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. https://t.co/YprwsthysY
1
2
23
@KaitlynZhou
Kaitlyn Zhou
23 days
As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. https://t.co/YprwsthysY
1
36
81
@realJessyLin
Jessy Lin
23 days
🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full
52
299
2K
@TuhinChakr
Tuhin Chakrabarty
23 days
🚨New paper on AI and copyright Several authors have sued LLM companies for allegedly using their books without permission for model training. 👩‍⚖️Courts, however, require empirical evidence of harm (e.g., market dilution). Our new pre-registered study addresses exactly this
9
173
526
@cervisiarius
Bob West
27 days
🚨New paper alert! 🚨 Tandem Training for Language Models https://t.co/Emzcgf1KHx Actions & thoughts of AI w/ superhuman skills will be hard for humans to follow, undermining human oversight of AI. We propose a new way to make AI produce human-understandable solutions. How?👉🧵
4
23
67
@WilliamBarrHeld
Will Held
27 days
Inspired by @karpathy’s NanoChat, I created a simple experiment for running a full pre-mid-post-training run using Marin! The whole thing runs on a single v5p-8, so researchers can reproduce the whole thing at extremely low cost on the TPU Research Cloud!
1
5
17
@ma_tay_
Taylor Sorensen
1 month
🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵
5
48
194
@yisongyue
Yisong Yue
1 month
Fun fact, @Diyi_Yang has great taste in fine dining at conferences as well 🍱🧑‍🍳#COLM2025 #professorgossip
@lm4sci
LM4SCI @ COLM2025
1 month
📅 Just 4 days until LM4Sci #COLM2025! 🤖🤝🔬 🔥 The countdown continues! Today's spotlight: Diyi Yang (Stanford) @Diyi_Yang, on a Human-Centered Perspective on Automating Research 🧵
2
9
50