Omar Shaikh @oshaikh13 X Profile

Omar Shaikh

@oshaikh13

Followers

2K

Following

7K

Media

36

Statuses

705

member of sociotechnical staff @Stanford

https://t.co/uyC0p1Kha2

🇸🇦→🇨🇦→🇺🇸→🇸🇦→🇺🇸

Joined December 2012

Don't wanna be here? Send us removal request.

Omar Shaikh

@oshaikh13

5 months

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

18

95

369

Jon Saad-Falcon

@JonSaadFalcon

18 hours

Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):

29

94

297

Preetum Nakkiran

@PreetumNakkiran

2 days

LLMs are notorious for "hallucinating": producing confident-sounding answers that are entirely wrong. But with the right definitions, we can extract a semantic notion of "confidence" from LLMs, and this confidence turns out to be calibrated out-of-the-box in many settings (!)

21

79

573

Alexander Spangher

@AlexanderSpangh

2 days

✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n

29

121

644

Eve Fleisig @ EMNLP 2025

@enfleisig

6 days

It's challenging to maintain data quality while preserving variation in data labels! We find that spam filtering for data annotation removes annotators who disagree instead of actual spammers, distorting data label distributions. 📄 https://t.co/ccwyvArvqV

1

4

31

Kaitlyn Zhou

@KaitlynZhou

7 days

No better time to learn about that #AI thing everyone's talking about... 📢 I'm recruiting PhD students in Computer Science or Information Science @Cornell_Bowers! If you're interested, apply to either department (yes, either program!) and list me as a potential advisor!

18

105

532

Akaash Kolluri

@KolluriAkaash

7 days

New EMNLP main paper: “Finetuning LLMs for Human Behavior Prediction in Social Science Experiments” We built SocSci210—2.9M human responses from 210 social science experiments. Finetuning Qwen2.5-14B on SocSci210 beats its base model by 26% & GPT-4o by 13% on unseen studies.🧵

2

8

29

John Yang

@jyangballin

8 days

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

28

91

367

Caleb Ziems

@cjziems

9 days

Can we map out gaps in LLMs’ cultural knowledge? Check out our #EMNLP2025 talk: Culture Cartography 🗓️ 11/5, 11:30 AM 📌 A109 (CSS Orals 1) Compared to trad. benchmarking, our mixed-initiative method finds more gaps even in reasoning models like R1! 📄 https://t.co/6RtZCuskl1

1

28

107

Omar Shaikh

@oshaikh13

16 days

We recorded a bunch of people actually working on their computers (!) and then compared agent performance to actual human workflows. Awesome paper led by @ZhiruoW :)

Zora Wang

@ZhiruoW

16 days

Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine

3

6

45

Sally Zhu

@SallyHZhu

21 days

🔎Did someone steal your language model? We can tell you, as long as you shuffled your training data🔀. All we need is some text from their model! Concretely, suppose Alice trains an open-weight model and Bob uses it to produce text. Can Alice prove Bob used her model?🚨

34

93

757

Omar Shaikh

@oshaikh13

22 days

STanFoRd cLasSes aRE oUtDaTeD 🤡

Diyi Yang

@Diyi_Yang

22 days

Thanks @thinkymachines for supporting Tinker access for our CS329x students on Homework 2 😉

3

9

374

Jenna Russell

@jennajrussell

22 days

AI is already at work in American newsrooms. We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea. Here's what we learned about how AI is influencing local and national journalism:

4

53

144

Myra Cheng

@chengmyra1

23 days

Our new preprint on the ways that LLM r&d overlook the needs of broader populations!! If LLMs are to be widely adopted, we need to move beyond "me-search". So excited that this work is finally out, been mentioning it in so many conversations :)

Kaitlyn Zhou

@KaitlynZhou

23 days

As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. https://t.co/YprwsthysY

1

2

23

Kaitlyn Zhou

@KaitlynZhou

23 days

As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. https://t.co/YprwsthysY

1

36

81

Jessy Lin

@realJessyLin

23 days

🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full

52

299

2K

Tuhin Chakrabarty

@TuhinChakr

23 days

🚨New paper on AI and copyright Several authors have sued LLM companies for allegedly using their books without permission for model training. 👩‍⚖️Courts, however, require empirical evidence of harm (e.g., market dilution). Our new pre-registered study addresses exactly this

9

173

526

Bob West

@cervisiarius

27 days

🚨New paper alert! 🚨 Tandem Training for Language Models https://t.co/Emzcgf1KHx Actions & thoughts of AI w/ superhuman skills will be hard for humans to follow, undermining human oversight of AI. We propose a new way to make AI produce human-understandable solutions. How?👉🧵

4

23

67

Will Held

@WilliamBarrHeld

27 days

Inspired by @karpathy’s NanoChat, I created a simple experiment for running a full pre-mid-post-training run using Marin! The whole thing runs on a single v5p-8, so researchers can reproduce the whole thing at extremely low cost on the TPU Research Cloud!

1

5

17

Taylor Sorensen

@ma_tay_

1 month

🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵

5

48

194

Yisong Yue

@yisongyue

1 month

Fun fact, @Diyi_Yang has great taste in fine dining at conferences as well 🍱🧑‍🍳#COLM2025 #professorgossip

LM4SCI @ COLM2025

@lm4sci

1 month

📅 Just 4 days until LM4Sci #COLM2025! 🤖🤝🔬 🔥 The countdown continues! Today's spotlight: Diyi Yang (Stanford) @Diyi_Yang, on a Human-Centered Perspective on Automating Research 🧵

2

9

50