smitha milli @SmithaMilli X Profile

smitha milli

@SmithaMilli

Followers

3K

Following

6K

Media

14

Statuses

151

research scientist, FAIR; opinions are my own 🥺 👉👈

https://t.co/rx2ar121PW

nyc

Joined November 2011

Don't wanna be here? Send us removal request.

smitha milli

@SmithaMilli

3 months

Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵

12

70

330

xuan (ɕɥɛn / sh-yen)

@xuanalogue

8 days

sad that they've taken out the Cooperative Inverse Reinforcement Learning paper out of most AI alignment syllabi since assistance games are such a useful formalism for thinking about how to make LLM assistants more rational

1

3

18

Divya Shanmugam

@dmshanmugam

12 days

I am on the job market this year! My research advances methods for reliable machine learning from real-world data, with a focus on healthcare. Happy to chat if this is of interest to you or your department/team.

4

49

247

Emma Pierson

@2plus2make5

11 days

One month left to apply for our postdoc position @berkeley_ai! Apply here: https://t.co/fdGyaB6f8R.

aprecruit.berkeley.edu

University of California, Berkeley is hiring. Apply now!

Emma Pierson

@2plus2make5

2 months

🚨 New postdoc position in our lab @Berkeley_EECS! 🚨 (please retweet + share with relevant candidates) We seek applicants with experience in language modeling who are excited about high-impact applications in the health and social sciences! More info in thread 1/3

0

5

19

Manoel

@manoelribeiro

1 month

Social media feeds today are optimized for engagement, often leading to misalignment between users' intentions and technology use. In a new paper, we introduce Bonsai, a tool to create feeds based on stated preferences, rather than predicted engagement.

1

13

39

Vincent Conitzer

@conitzer

14 days

The next iteration of the Social Choice for AI Ethics and Safety workshop was accepted to be held at IASEAI'26, Paris, in February! https://t.co/XtdgaPdT3L

sites.google.com

Social Choice for AI Ethics and Safety 2026 Europe (SC4AI'26e) will take place at IASEAI'26 in Paris, France on February 26, 2026 The workshop is organized by Vincent Conitzer, Jobst Heitzig, and...

0

2

9

Mark Ibrahim

@marksibrahim

17 days

One can manipulate LLM rankings to put any model in the lead—only by modifying the single character separating demonstration examples. Learn more in our new paper https://t.co/D8CzSpPxMU w/ Jingtong Su, Jianyu Zhang, @karen_ullrich , and Léon Bottou. 1/3 🧵

1

3

11

Steve Rathje

@steverathje2

25 days

🚨 New preprint 🚨 Across 3 experiments (n = 3,285), we found that interacting with sycophantic (or overly agreeable) AI chatbots entrenched attitudes and led to inflated self-perceptions. Yet, people preferred sycophantic chatbots and viewed them as unbiased! Thread 🧵

4

37

137

Myra Cheng

@chengmyra1

23 days

AI always calling your ideas “fantastic” can feel inauthentic, but what are sycophancy’s deeper harms? We find that in the common use case of seeking AI advice on interpersonal situations—specifically conflicts—sycophancy makes people feel more right & less willing to apologize.

6

50

196

Sadhika Malladi

@SadhikaMalladi

1 month

Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here:

37

70

525

smitha milli

@SmithaMilli

1 month

do you use Letterboxd? would you be willing to participate in a 30-min research study where you use movie recommenders based on your Letterboxd ratings? DM me! (you will receive $20 for participating)

0

3

13

Johan Ugander

@jugander

1 month

📣Yale social algorithms workshop, Oct 16-17!📣 What's new in content ranking? Content moderation? How can platforms promote civility? Hosted by Yale's Institute for Foundations of Data Science. Great speakers! Submit posters by 9/22! Spread the word!

yalefds.swoogo.com

As social media algorithms increasingly mediate social experiences, there has been a rapid increase in research on the effects of how these algorithms are configured, alternatives to engagement-cen...

2

11

Ryan Lowe 🥞

@ryan_t_lowe

4 months

Introducing: Full-Stack Alignment 🥞 A research program dedicated to co-aligning AI systems *and* institutions with what people value. It's the most ambitious project I've ever undertaken. Here's what we're doing: 🧵

13

44

207

smitha milli

@SmithaMilli

3 months

And this is not the end! 😉 If you want to support us in doing more of these releases, email communityalignment@meta.com (or me) with feedback on what you liked about CA and what you want to see more of Paper: https://t.co/0XorBjggtv Dataset:

huggingface.co

0

30

smitha milli

@SmithaMilli

3 months

This was a big project and collective effort -- major thanks to all the collaborators (see image)🙏 @lilyhzhang and I will be presenting it at the ICML MoFA workshop on Friday, say hi if you want to chat more!

1

4

19

smitha milli

@SmithaMilli

3 months

Finally, based on these insights we collect Community Alignment (CA). Features include: - NC-sampled candidate responses - Multilingual - >2500 prompts are annotated by >= 10 people - Natural language explanations for > 1/4 of choices and more!

1

10

smitha milli

@SmithaMilli

3 months

We show that using NC sampled candidates significantly improves the ability of alignment methods to learn heterogeneous preferences. Win rates jump from random chance to ~0.8 in the settings we tested.

1

0

12

smitha milli

@SmithaMilli

3 months

To produce more diverse candidate sets, rather than independently sampling them, you want some kind of "negatively-correlated (NC) sampling", where sampling one candidate makes other similar ones less likely Turns out, prompting can implement this decently well 🤡

1

0

14

smitha milli

@SmithaMilli

3 months

Intuitively, if all the candidate responses only cover one set of values, then you'll never be able to learn preferences outside of those values. it's like if someone asks me to pick between four types of apples... like hello ??? i want a mango, but you won't be measuring that

1

21

smitha milli

@SmithaMilli

3 months

Standard alignment methods fail to learn common human preferences (as identified from our joint human-model study) from existing preference datasets because the candidate responses that people choose from are too homogeneous, even when they are sampled from multiple models.

1

2

18

smitha milli

@SmithaMilli

3 months

We started by conducting a joint human study and model evaluation with 15,000 nationally-representative participants from 5 countries & 21 LLMs. We found that the LLMs exhibited an *algorithmic monoculture* and were all aligned with the same minority of human preferences.

1

5

21