soniakmurthy Profile Banner
Sonia Murthy Profile
Sonia Murthy

@soniakmurthy

Followers
329
Following
136
Media
17
Statuses
49

cs phd student @harvard · prev predoc @allen_ai, ra @cocosci_lab, undergrad @princeton · she/her

Joined May 2022
Don't wanna be here? Send us removal request.
@EricBigelow
Eric Bigelow
3 days
📝 New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9
8
21
125
@kushin_m
Kushin Mukherjee
24 days
Zach did a stellar job on our new paper looking at what recipes make for language models that are representationally aligned with humans! Read his tweetprint and recruit him for grad school!
@ZachStuddiford
Zach Studdiford
24 days
We’re drowning in language models — there are over 2 mil. of them on Huggingface! Can we use some of them to understand which computational ingredients — architecture, scale, post-training, etc. – help us build models that align with human representations? Read on to find out 🧵
2
4
4
@soniakmurthy
Sonia Murthy
1 month
Thanks to my lovely collaborators @rosieyzh, @_jennhu, @ShamKakade6, @m_wulfmeier, Peng Qian, and @TomerUllman and the Kempner Institute! 🧠 [end]
0
0
1
@soniakmurthy
Sonia Murthy
1 month
We also trace the evolution of value trade-offs during alignment by evaluating model checkpoints for 8 unique base model x feedback dataset x alignment algorithm. We see the largest shifts in values early on in training, with strongest effects of base model choice.
1
0
2
@soniakmurthy
Sonia Murthy
1 month
We use a cognitive model of polite speech to identify how models trade-off being honest, kind, and saving face. We find that a small reasoning budget reinforces models' default, information-biased trade-offs, and that sycophantic value patterns easily emerge via prompting.
1
0
1
@soniakmurthy
Sonia Murthy
1 month
Excited to present our new paper as a spotlight talk 🌟 at the Pragmatic Reasoning in LMs workshop at #COLM2025 this Friday! 🍁 Come by room 520B @ 11:30am tomorrow to learn more about how LLMs' pluralistic values evolve over reasoning budgets and alignment 🧵
1
5
28
@apoorvkh
Apoorv Khandelwal
1 month
In our new paper, we ask whether language models solve compositional tasks using compositional mechanisms. 🧵
4
27
184
@soniakmurthy
Sonia Murthy
7 months
Presenting this today (5/1) at the 4pm poster session (Hall 3) at #NAACL2025! Come chat about alignment, personalization, and all things cognitive science 🐟
@soniakmurthy
Sonia Murthy
9 months
(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: https://t.co/C4icfhCDGz
0
2
21
@KempnerInst
Kempner Institute at Harvard University
9 months
NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. Read more: https://t.co/CbzUj5dIkF @soniakmurthy @tomerullman @_jennhu
0
4
21
@soniakmurthy
Sonia Murthy
9 months
Many thanks to my collaborators and @KempnerInst for helping make this idea come to life!🌱
0
1
2
@soniakmurthy
Sonia Murthy
9 months
(8/9) We think that better understanding such tradeoffs will be important to building LLMs that are aligned to human values– human values are diverse, our models should be too.
1
0
0
@soniakmurthy
Sonia Murthy
9 months
(7/9) This suggests a trade-off: increasing model safety in terms of value alignment decreases safety in terms of diversity of thoughts and opinion.
1
0
0
@soniakmurthy
Sonia Murthy
9 months
(6/9) We put a suite of aligned models, and their instruction fine-tuned counterparts, to the test and found: * no model reaches human-like diversity of thought. * aligned models show LESS conceptual diversity than instruction fine-tuned counterparts
1
0
0
@soniakmurthy
Sonia Murthy
9 months
(5/9) Our experiments are inspired by human studies in two domains with rich behavioral data.
1
0
0
@soniakmurthy
Sonia Murthy
9 months
(4/9) We introduce a new way of measuring the conceptual diversity of synthetically-generated LLM "populations" by considering how its “individuals’” variability relates to that of the population.
1
0
0
@soniakmurthy
Sonia Murthy
9 months
(3/9) One key issue is whether LLMs capture conceptual diversity: the variation among individuals’ representations of a particular domain. How do we measure this? And how does alignment affect this?
1
0
0
@soniakmurthy
Sonia Murthy
9 months
(2/9) There's a lot of interest right now in getting LLMs to mimic the response distributions of “populations”--heterogeneous collections of individuals– for the purposes of political polling, opinion surveys, and behavioral research.
1
0
0
@soniakmurthy
Sonia Murthy
9 months
(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: https://t.co/C4icfhCDGz
3
15
74