Sonia Murthy @soniakmurthy X Profile

Sonia Murthy

@soniakmurthy

Followers

329

Following

136

Media

17

Statuses

49

cs phd student @harvard · prev predoc @allen_ai, ra @cocosci_lab, undergrad @princeton · she/her

https://t.co/sbhdzVf3Mb

Joined May 2022

Don't wanna be here? Send us removal request.

Eric Bigelow

@EricBigelow

3 days

📝 New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9

8

21

125

Kushin Mukherjee

@kushin_m

24 days

Zach did a stellar job on our new paper looking at what recipes make for language models that are representationally aligned with humans! Read his tweetprint and recruit him for grad school!

Zach Studdiford

@ZachStuddiford

24 days

We’re drowning in language models — there are over 2 mil. of them on Huggingface! Can we use some of them to understand which computational ingredients — architecture, scale, post-training, etc. – help us build models that align with human representations? Read on to find out 🧵

2

4

Sonia Murthy

@soniakmurthy

1 month

Thanks to my lovely collaborators @rosieyzh, @_jennhu, @ShamKakade6, @m_wulfmeier, Peng Qian, and @TomerUllman and the Kempner Institute! 🧠 [end]

0

1

Sonia Murthy

@soniakmurthy

1 month

We also trace the evolution of value trade-offs during alignment by evaluating model checkpoints for 8 unique base model x feedback dataset x alignment algorithm. We see the largest shifts in values early on in training, with strongest effects of base model choice.

1

0

2

Sonia Murthy

@soniakmurthy

1 month

We use a cognitive model of polite speech to identify how models trade-off being honest, kind, and saving face. We find that a small reasoning budget reinforces models' default, information-biased trade-offs, and that sycophantic value patterns easily emerge via prompting.

1

0

1

Sonia Murthy

@soniakmurthy

1 month

Paper: https://t.co/feDaH3RvKY Kempner Deeper Learning blog feature: https://t.co/JiEqGgeiPx Code: https://t.co/vshmvaIS4f and brief highlights below!

github.com

Using cognitive models to reveal value trade-offs in language models - skmur/many-wolves

1

0

2

Sonia Murthy

@soniakmurthy

1 month

Excited to present our new paper as a spotlight talk 🌟 at the Pragmatic Reasoning in LMs workshop at #COLM2025 this Friday! 🍁 Come by room 520B @ 11:30am tomorrow to learn more about how LLMs' pluralistic values evolve over reasoning budgets and alignment 🧵

1

5

28

Apoorv Khandelwal

@apoorvkh

1 month

In our new paper, we ask whether language models solve compositional tasks using compositional mechanisms. 🧵

4

27

184

Sonia Murthy

@soniakmurthy

7 months

Presenting this today (5/1) at the 4pm poster session (Hall 3) at #NAACL2025! Come chat about alignment, personalization, and all things cognitive science 🐟

Sonia Murthy

@soniakmurthy

9 months

(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: https://t.co/C4icfhCDGz

0

2

21

Kempner Institute at Harvard University

@KempnerInst

9 months

NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. Read more: https://t.co/CbzUj5dIkF @soniakmurthy @tomerullman @_jennhu

0

4

21

Sonia Murthy

@soniakmurthy

9 months

Many thanks to my collaborators and @KempnerInst for helping make this idea come to life!🌱

0

1

2

Sonia Murthy

@soniakmurthy

9 months

(9/9) Code and data for our experiments can be found at: https://t.co/CSicsUKs64 Preprint: https://t.co/C4icfhCDGz Also, check out our feature in the @KempnerInst Deeper Learning Blog!

kempnerinstitute.harvard.edu

As large language models (LLMs) have become more sophisticated, there’s been growing interest in using LLM-generated responses in place of human data for tasks such as polling, user studies, and […]

1

0

Sonia Murthy

@soniakmurthy

9 months

(8/9) We think that better understanding such tradeoffs will be important to building LLMs that are aligned to human values– human values are diverse, our models should be too.

1

0

Sonia Murthy

@soniakmurthy

9 months

(7/9) This suggests a trade-off: increasing model safety in terms of value alignment decreases safety in terms of diversity of thoughts and opinion.

1

0

Sonia Murthy

@soniakmurthy

9 months

(6/9) We put a suite of aligned models, and their instruction fine-tuned counterparts, to the test and found: * no model reaches human-like diversity of thought. * aligned models show LESS conceptual diversity than instruction fine-tuned counterparts

1

0

Sonia Murthy

@soniakmurthy

9 months

(5/9) Our experiments are inspired by human studies in two domains with rich behavioral data.

1

0

Sonia Murthy

@soniakmurthy

9 months

(4/9) We introduce a new way of measuring the conceptual diversity of synthetically-generated LLM "populations" by considering how its “individuals’” variability relates to that of the population.

1

0

Sonia Murthy

@soniakmurthy

9 months

(3/9) One key issue is whether LLMs capture conceptual diversity: the variation among individuals’ representations of a particular domain. How do we measure this? And how does alignment affect this?

1

0

Sonia Murthy

@soniakmurthy

9 months

(2/9) There's a lot of interest right now in getting LLMs to mimic the response distributions of “populations”--heterogeneous collections of individuals– for the purposes of political polling, opinion surveys, and behavioral research.

1

0

Sonia Murthy

@soniakmurthy

9 months

(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: https://t.co/C4icfhCDGz

3

15

74