Declan Grabb, MD @declangrabbmd X Profile

Declan Grabb, MD

@declangrabbmd

Followers

102

Following

359

Media

7

Statuses

59

Joined February 2023

Don't wanna be here? Send us removal request.

Declan Grabb, MD

@declangrabbmd

7 months

RT @pash22: AI’s dangerous mental-health blind spot: People are increasingly turning to chatbots that struggle to detect violent or suicida….

statnews.com

People are increasingly turning to chatbots for help. But AIs struggle to detect violent or suicidal intentions.

0

3

0

Declan Grabb, MD

@declangrabbmd

7 months

RT @MLamparth: Check out our new op-ed in @statnews about mental-health blind spots of chatbots posing risks to users in mental health emer….

0

2

0

Declan Grabb, MD

@declangrabbmd

7 months

So honored to have been able to contribute to this work with @MLCommons, adding more granularity to (and expanding) the way the AI community views mental health in the context of AI safety! . I’ll be at @NeurIPSConf this week, presenting on related topics. Would love to chat with.

MLCommons

@MLCommons

8 months

Announcing the release of AILuminate, a first-of-its kind benchmark to measure the safety of LLMs. The AILuminate v1.0 benchmark offers a comprehensive set of safety grades for today's most prevalent #LLMs. (1/4)

0

1

2

Declan Grabb, MD

@declangrabbmd

9 months

Thank you @APApsychiatric for the invitation to speak about user safety and AI today — I always love when I can talk about sparse autoencoders and the DSM all within the same hour!

0

4

Declan Grabb, MD

@declangrabbmd

9 months

incredible work by @esindurmusnlp !.

Anthropic

@AnthropicAI

9 months

New Anthropic research: Evaluating feature steering. In May, we released Golden Gate Claude: an AI fixated on the Golden Gate Bridge due to our use of “feature steering”. We've now done a deeper study on the effects of feature steering. Read the post:

1

0

1

Declan Grabb, MD

@declangrabbmd

9 months

Link to paper:

0

Declan Grabb, MD

@declangrabbmd

9 months

There is a ton more to do here, and I’m particularly excited to repeat this work with larger SAE’s, more exhaustively look for MHRF’s, and see how the presence of MHRF’s can impact a model’s emotional intelligence! Please message or email if you’re interested in collaborating!.

1

0

Declan Grabb, MD

@declangrabbmd

9 months

Clamping this feature, though, made the model far less likely to provide harmful advice to users!.

1

0

Declan Grabb, MD

@declangrabbmd

9 months

I also focused in on one MHRF that pertained to suicide in Layer 25. I found that it activated more strongly on words pertaining to self-harm, and I found that amplifying it resulted in a more unsafe model that openly discussed self-harm.

1

0

Declan Grabb, MD

@declangrabbmd

9 months

If this pattern holds for larger SOTA language models, it might help explain why they’re a bit worse at detecting psychosis and mania than instances of self-harm and suicide — something we found in prior work. (.

1

0

Declan Grabb, MD

@declangrabbmd

9 months

Yet Gemma-2-2B lacks a similar amount of features that pertain to hallmark symptoms of bipolar disorder or schizophrenia.

1

0

Declan Grabb, MD

@declangrabbmd

9 months

🔎 In this initial work, I find that Gemma-2-2B has many internal features (what we call Mental-Health-Related Features or MHRFs) that represent complex concepts like sadness and suicide across most layers.

1

0

Declan Grabb, MD

@declangrabbmd

9 months

Immensely grateful to the folks at @neuronpedia and the work behind Gemma Scope (@lieberum_t et al) for creating tools that allow domain experts to investigate language models at such a granular level. Also grateful to @MLamparth for sharing his expertise in interpretability with.

1

0

1

Declan Grabb, MD

@declangrabbmd

9 months

I’m proud and excited that this paper was accepted to the NeurIPS Behavioral ML Workshop! 🧠 @BehavioralML . As a psychiatrist, I’m often surprised at how language models miss very obvious signs of user mental health crises. And I wanted to try to help solve this problem! 🧵

1

0

2

Declan Grabb, MD

@declangrabbmd

9 months

Come talk to us at @AIESConf about the ethical challenges that AI presents in mental healthcare + how it can be leveraged to democratize access to mental healthcare at scale!.

Max Lamparth

@MLamparth

9 months

Today, we'll present our work at #AIES @AIESConf on .- What are the ethical challenges associated with psychiatric care?.- How does ethical AI decision-making look like in this context?.- Are current language models safe enough?.Come to poster session 2!

0

1

6

Declan Grabb, MD

@declangrabbmd

9 months

Super excited to have contributed to this work with @StanfordHAI. Definitely check this summary paper out for actionable recommendations on deploying AI in high-stakes settings!.

Stanford HAI

@StanfordHAI

9 months

New: The #RAISEHealth Symposium Summary Paper is now out! Featuring insights from 60+ experts and actionable recommendations on the responsible use of AI to transform biomedicine. Find out more: @StanfordMed

0

1

6

Declan Grabb, MD

@declangrabbmd

9 months

0

2

Declan Grabb, MD

@declangrabbmd

9 months

I’ll be giving an invited talk on “Mental Health Risk & AI” for the @APApsychiatric on November 5. I hope to be specific and actionable — highlighting the clear need for a combination of technical and clinical expertise to maximize AI’s utility for all users while mitigating.

1

Declan Grabb, MD

@declangrabbmd

9 months

RT @MLamparth: Had a great time at @COLM_conf in Philly presenting our work with @declangrabbmd and.@NinaVasan! Thank you for everyone who….

0

2

0

Declan Grabb, MD

@declangrabbmd

10 months

RT @MLamparth: Excited to present our work at @COLM_conf in Philadelphia starting tomorrow with @declangrabbmd and @NinaVasan ! Hit me up i….

0

2

0