smitha milli
@SmithaMilli
Followers
3K
Following
6K
Media
14
Statuses
151
research scientist, FAIR; opinions are my own 🥺 👉👈
nyc
Joined November 2011
Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵
12
70
330
sad that they've taken out the Cooperative Inverse Reinforcement Learning paper out of most AI alignment syllabi since assistance games are such a useful formalism for thinking about how to make LLM assistants more rational
1
3
18
I am on the job market this year! My research advances methods for reliable machine learning from real-world data, with a focus on healthcare. Happy to chat if this is of interest to you or your department/team.
4
49
247
One month left to apply for our postdoc position @berkeley_ai! Apply here: https://t.co/fdGyaB6f8R.
aprecruit.berkeley.edu
University of California, Berkeley is hiring. Apply now!
🚨 New postdoc position in our lab @Berkeley_EECS! 🚨 (please retweet + share with relevant candidates) We seek applicants with experience in language modeling who are excited about high-impact applications in the health and social sciences! More info in thread 1/3
0
5
19
Social media feeds today are optimized for engagement, often leading to misalignment between users' intentions and technology use. In a new paper, we introduce Bonsai, a tool to create feeds based on stated preferences, rather than predicted engagement.
1
13
39
The next iteration of the Social Choice for AI Ethics and Safety workshop was accepted to be held at IASEAI'26, Paris, in February! https://t.co/XtdgaPdT3L
sites.google.com
Social Choice for AI Ethics and Safety 2026 Europe (SC4AI'26e) will take place at IASEAI'26 in Paris, France on February 26, 2026 The workshop is organized by Vincent Conitzer, Jobst Heitzig, and...
0
2
9
One can manipulate LLM rankings to put any model in the lead—only by modifying the single character separating demonstration examples. Learn more in our new paper https://t.co/D8CzSpPxMU w/ Jingtong Su, Jianyu Zhang, @karen_ullrich , and Léon Bottou. 1/3 🧵
1
3
11
🚨 New preprint 🚨 Across 3 experiments (n = 3,285), we found that interacting with sycophantic (or overly agreeable) AI chatbots entrenched attitudes and led to inflated self-perceptions. Yet, people preferred sycophantic chatbots and viewed them as unbiased! Thread 🧵
4
37
137
AI always calling your ideas “fantastic” can feel inauthentic, but what are sycophancy’s deeper harms? We find that in the common use case of seeking AI advice on interpersonal situations—specifically conflicts—sycophancy makes people feel more right & less willing to apologize.
6
50
196
Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here:
37
70
525
do you use Letterboxd? would you be willing to participate in a 30-min research study where you use movie recommenders based on your Letterboxd ratings? DM me! (you will receive $20 for participating)
0
3
13
📣Yale social algorithms workshop, Oct 16-17!📣 What's new in content ranking? Content moderation? How can platforms promote civility? Hosted by Yale's Institute for Foundations of Data Science. Great speakers! Submit posters by 9/22! Spread the word!
yalefds.swoogo.com
As social media algorithms increasingly mediate social experiences, there has been a rapid increase in research on the effects of how these algorithms are configured, alternatives to engagement-cen...
2
2
11
Introducing: Full-Stack Alignment 🥞 A research program dedicated to co-aligning AI systems *and* institutions with what people value. It's the most ambitious project I've ever undertaken. Here's what we're doing: 🧵
13
44
207
And this is not the end! 😉 If you want to support us in doing more of these releases, email communityalignment@meta.com (or me) with feedback on what you liked about CA and what you want to see more of Paper: https://t.co/0XorBjggtv Dataset:
huggingface.co
0
0
30
This was a big project and collective effort -- major thanks to all the collaborators (see image)🙏 @lilyhzhang and I will be presenting it at the ICML MoFA workshop on Friday, say hi if you want to chat more!
1
4
19
Finally, based on these insights we collect Community Alignment (CA). Features include: - NC-sampled candidate responses - Multilingual - >2500 prompts are annotated by >= 10 people - Natural language explanations for > 1/4 of choices and more!
1
1
10
We show that using NC sampled candidates significantly improves the ability of alignment methods to learn heterogeneous preferences. Win rates jump from random chance to ~0.8 in the settings we tested.
1
0
12
To produce more diverse candidate sets, rather than independently sampling them, you want some kind of "negatively-correlated (NC) sampling", where sampling one candidate makes other similar ones less likely Turns out, prompting can implement this decently well 🤡
1
0
14
Intuitively, if all the candidate responses only cover one set of values, then you'll never be able to learn preferences outside of those values. it's like if someone asks me to pick between four types of apples... like hello ??? i want a mango, but you won't be measuring that
1
1
21
Standard alignment methods fail to learn common human preferences (as identified from our joint human-model study) from existing preference datasets because the candidate responses that people choose from are too homogeneous, even when they are sampled from multiple models.
1
2
18
We started by conducting a joint human study and model evaluation with 15,000 nationally-representative participants from 5 countries & 21 LLMs. We found that the LLMs exhibited an *algorithmic monoculture* and were all aligned with the same minority of human preferences.
1
5
21