
Guy Davidson
@guyd33
Followers
1K
Following
13K
Media
73
Statuses
962
PhD @NYUDataScience, visiting researcher @AIatMeta, interested in AI & CogSci, specifically in goals and their representations in minds and machines (he/him).
New York, USA
Joined April 2019
Cool new work on localizing and removing concepts using attention heads from colleagues at NYU and Meta!.
How would you make an LLM "forget" the concept of dog β or any other arbitrary concept? πΆβ. We introduce SAMD & SAMI β a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.
0
0
5
RT @karen_ullrich: How would you make an LLM "forget" the concept of dog β or any other arbitrary concept? πΆβ. We introduce SAMD & SAMI β aβ¦.
0
12
0
You (yes, you!) should work with Sydney! Either short-term this summer, or longer term at her nascent lab at NYU!.
π I'm hiring! π. There are two open positions:. 1. Summer research position (best for master's or graduate student); focus on computational social cognition. 2. Postdoc (currently interviewing!); focus on computational social cognition and AI safety.
0
0
10
RT @soniajoseph_: Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mecβ¦.
0
30
0
Fantastic new work by @jcyhc_ai (with @LakeBrenden and me trying not to cause too much trouble). We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More fun analyses in the paper!.
Do LLMs show systematic generalization of safety facts to novel scenarios?. Introducing our work SAGE-Eval, a benchmark consisting of 100+ safety facts and 10k+ scenarios to test this!. - Claude-3.7-Sonnet passes only 57% of facts evaluated.- o1 and o3-mini passed <45%! π§΅
0
2
6
If you made it this far, thank you, and don't hesitate to reach out! 17/N=17.Paper: Code:
github.com
Contribute to guydav/prompting-methods-task-representations development by creating an account on GitHub.
1
0
7
As with pretty much everything else I've worked on in grad school, this work would have looked different (and almost certainly worse) without the guidance of my advisors, @LakeBrenden and @todd_gureckis . I continue to appreciate your thoughtful engagement with me/my work! 16/N.
1
0
3
This work would also have been impossible without @adinamwilliams 's guidance, the freedom she gave me in picking a problem to study, and believing in me that I could tackle it despite it being my first foray into (mechanistic) interpretability work. 15/N.
1
0
3
We owe a great deal of gratitude to @ericwtodd , not only for open-sourcing their code, but also for answering our numerous questions over the last few months. If you find this interesting, you should also read their paper introducing function vectors. 14/N.
1
0
2