Sonia Murthy
@soniakmurthy
Followers
329
Following
136
Media
17
Statuses
49
cs phd student @harvard · prev predoc @allen_ai, ra @cocosci_lab, undergrad @princeton · she/her
Joined May 2022
📝 New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9
8
21
125
Zach did a stellar job on our new paper looking at what recipes make for language models that are representationally aligned with humans! Read his tweetprint and recruit him for grad school!
We’re drowning in language models — there are over 2 mil. of them on Huggingface! Can we use some of them to understand which computational ingredients — architecture, scale, post-training, etc. – help us build models that align with human representations? Read on to find out 🧵
2
4
4
Thanks to my lovely collaborators @rosieyzh, @_jennhu, @ShamKakade6, @m_wulfmeier, Peng Qian, and @TomerUllman and the Kempner Institute! 🧠 [end]
0
0
1
We also trace the evolution of value trade-offs during alignment by evaluating model checkpoints for 8 unique base model x feedback dataset x alignment algorithm. We see the largest shifts in values early on in training, with strongest effects of base model choice.
1
0
2
We use a cognitive model of polite speech to identify how models trade-off being honest, kind, and saving face. We find that a small reasoning budget reinforces models' default, information-biased trade-offs, and that sycophantic value patterns easily emerge via prompting.
1
0
1
Paper: https://t.co/feDaH3RvKY Kempner Deeper Learning blog feature: https://t.co/JiEqGgeiPx Code: https://t.co/vshmvaIS4f and brief highlights below!
github.com
Using cognitive models to reveal value trade-offs in language models - skmur/many-wolves
1
0
2
Excited to present our new paper as a spotlight talk 🌟 at the Pragmatic Reasoning in LMs workshop at #COLM2025 this Friday! 🍁 Come by room 520B @ 11:30am tomorrow to learn more about how LLMs' pluralistic values evolve over reasoning budgets and alignment 🧵
1
5
28
In our new paper, we ask whether language models solve compositional tasks using compositional mechanisms. 🧵
4
27
184
Presenting this today (5/1) at the 4pm poster session (Hall 3) at #NAACL2025! Come chat about alignment, personalization, and all things cognitive science 🐟
(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: https://t.co/C4icfhCDGz
0
2
21
NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. Read more: https://t.co/CbzUj5dIkF
@soniakmurthy @tomerullman @_jennhu
0
4
21
Many thanks to my collaborators and @KempnerInst for helping make this idea come to life!🌱
0
1
2
(9/9) Code and data for our experiments can be found at: https://t.co/CSicsUKs64 Preprint: https://t.co/C4icfhCDGz Also, check out our feature in the @KempnerInst Deeper Learning Blog!
kempnerinstitute.harvard.edu
As large language models (LLMs) have become more sophisticated, there’s been growing interest in using LLM-generated responses in place of human data for tasks such as polling, user studies, and […]
1
0
0
(8/9) We think that better understanding such tradeoffs will be important to building LLMs that are aligned to human values– human values are diverse, our models should be too.
1
0
0
(7/9) This suggests a trade-off: increasing model safety in terms of value alignment decreases safety in terms of diversity of thoughts and opinion.
1
0
0
(6/9) We put a suite of aligned models, and their instruction fine-tuned counterparts, to the test and found: * no model reaches human-like diversity of thought. * aligned models show LESS conceptual diversity than instruction fine-tuned counterparts
1
0
0
(5/9) Our experiments are inspired by human studies in two domains with rich behavioral data.
1
0
0
(4/9) We introduce a new way of measuring the conceptual diversity of synthetically-generated LLM "populations" by considering how its “individuals’” variability relates to that of the population.
1
0
0
(3/9) One key issue is whether LLMs capture conceptual diversity: the variation among individuals’ representations of a particular domain. How do we measure this? And how does alignment affect this?
1
0
0
(2/9) There's a lot of interest right now in getting LLMs to mimic the response distributions of “populations”--heterogeneous collections of individuals– for the purposes of political polling, opinion surveys, and behavioral research.
1
0
0
(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: https://t.co/C4icfhCDGz
3
15
74