Sarah Wiegreffe @sarahwiegreffe X Profile

Sarah Wiegreffe

@sarahwiegreffe

Followers

5K

Following

10K

Media

73

Statuses

1K

Research in NLP (mostly LM interpretability & explainability). Incoming assistant prof @umdcs @clipumd Current postdoc @allen_ai @uwnlp Views my own.

Joined September 2013

Don't wanna be here? Send us removal request.

Sarah Wiegreffe

@sarahwiegreffe

21 days

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland @umdcs this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

70

48

565

Sarah Wiegreffe

@sarahwiegreffe

17 days

Go Ai2 @allen_ai!.

clem 🤗

@ClementDelangue

17 days

This race is not zero-sum and benefits the whole humanity!

0

12

Sarah Wiegreffe

@sarahwiegreffe

2 months

RT @OrgadHadas: 🚨 Announcing the keynote speakers in the @ActInterp workshop at #icml2025.Join us to hear these leading experts share their….

0

5

0

Sarah Wiegreffe

@sarahwiegreffe

2 months

We got more submissions to the workshop than we anticipated, and are looking for reviewers willing to review 2-4 papers between May 24 and June 7. If you are interested, please self-nominate! Thank you 🙏.

Actionable Interpretability Workshop ICML2025

@ActInterp

3 months

🚨 We're looking for reviewers for the workshop!. If you're passionate about making interpretability useful and want to help shape the conversation, we'd love your input. Sign up to review >>💡🔍

0

5

13

Sarah Wiegreffe

@sarahwiegreffe

2 months

RT @boknilev: BlackboxNLP will be co-located with #EMNLP2025 in Suzhou this November! 📷This edition will feature a new shared task on circu….

0

20

0

Sarah Wiegreffe

@sarahwiegreffe

2 months

RT @tal_haklay: We knew many of you wanted to submit to our Actionable Interpretability workshop, but we didn’t expect to crash Overleaf! 😏….

0

5

0

Sarah Wiegreffe

@sarahwiegreffe

2 months

RT @boknilev: Since people have been asking - the #blackboxNLP workshop will return this year, to be held with #emnlp2025. This workshop i….

0

11

0

Sarah Wiegreffe

@sarahwiegreffe

2 months

RT @OrgadHadas: Just 6 days left! ⏰ Submit your work to the Actionable Interpretability Workshop at #ICML2025 by May 19th. Contribute to th….

0

5

0

Sarah Wiegreffe

@sarahwiegreffe

2 months

We extended the deadline by 10 days! Consider submitting ⬇️.

Hadas Orgad

@OrgadHadas

2 months

Congratulations to everyone whose papers were accepted to @icmlconf !.If your works lies at the intersection of interpretability and actionability/practicality/usefulness, consider submitting it to our @ActInterp conference track!.Deadline is May 19th.

0

4

15

Sarah Wiegreffe

@sarahwiegreffe

2 months

RT @lasha_nlp: Stoked that HALoGEN (non-archival version) won best paper award at the TrustNLP workshop @ #NAACL2025! . Our work explore….

0

11

0

Sarah Wiegreffe

@sarahwiegreffe

2 months

@jack_merullo_ @nlpnoah @yanaiela More details in Yanai's thread.

Yanai Elazar

@yanaiela

2 months

💡 New ICLR paper! 💡."On Linear Representations and Pretraining Data Frequency in Language Models":. We provide an explanation for when & why linear representations form in large (or small) language models. Led by @jack_merullo_ , w/ @nlpnoah & @sarahwiegreffe

0

3

Sarah Wiegreffe

@sarahwiegreffe

2 months

Checkout our new preprint/project which has been over a year in the making! This has been a very fun collaboration (and one of the biggest I've personally participated in). We are quite excited about the leaderboard and release, and are open to feedback to help this remain a.

Aaron Mueller

@amuuueller

2 months

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?. We propose 😎 𝗠𝗜𝗕: a Mechanistic Interpretability Benchmark!

0

2

29

Sarah Wiegreffe

@sarahwiegreffe

2 months

RT @yanaiela: 💡 New ICLR paper! 💡."On Linear Representations and Pretraining Data Frequency in Language Models":. We provide an explanation….

0

42

0

Sarah Wiegreffe

@sarahwiegreffe

2 months

2) On the connection between linear relational embeddings in LMs and frequency of the relations in the pretraining data.- Led by @jack_merullo_, w/ @nlpnoah @yanaiela.- - Yanai is presenting the poster tomorrow 04/26 from 10am-12:30pm (Hall 3+Hall 2B #236)

1

11

Sarah Wiegreffe

@sarahwiegreffe

2 months

I'm not at #ICLR25, but have 2 works being presented: . 1) Understanding how LMs answer multiple-choice questions.- - @boknilev is presenting the poster this morning 10-12:30 (Hall 3+Hall 2B #207).- Also with @oyvindtafjord @HannaHajishirzi @Ashish_S_AI

1

8

29

Sarah Wiegreffe

@sarahwiegreffe

3 months

Have work on the actionable impact of interpretability findings? Consider submitting to our Actionable Interpretability workshop at ICML! . Website: Deadline: May 9.

Hadas Orgad

@OrgadHadas

3 months

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉.>> Follow @ActInterp. @tal_haklay @anja_reu @mariusmosbach @sarahwiegreffe @iftenney @megamor2. Paper submission deadline: May 9th!

0

4

23

Sarah Wiegreffe

@sarahwiegreffe

6 months

RT @TuhinChakr: If you are looking to do a PhD in Generative AI , Creativity and Human Behavior , please apply to @sbucompsc PhD program by….

0

41

0

Sarah Wiegreffe

@sarahwiegreffe

6 months

RT @jacobandreas: Are you an undergrad interested in NLP research? Intern with us through the MIT summer research program! Includes stipend….

0

32

0

Sarah Wiegreffe

@sarahwiegreffe

8 months

RT @BlackboxNLP: @hima_lakkaraju @lieberum_t @sarahwiegreffe wrapping up the oral session with a presentation of "Mechanistic?". https://t.….

0

5

0