jaredlcm Profile Banner
Jared Moore Profile
Jared Moore

@jaredlcm

Followers
222
Following
120
Media
41
Statuses
88

@jaredlcm.bsky.social AI Researcher, Writer Stanford

Joined June 2017
Don't wanna be here? Send us removal request.
@jaredlcm
Jared Moore
19 days
Our conclusion: "LLMs’ apparent ToM abilities may be fundamentally different from humans' and might not extend to complex interactive tasks like planning.". Preprint: Code: Demo: /end 🧵.
Tweet card summary image
github.com
Contribute to jlcmoore/mindgames development by creating an account on GitHub.
1
0
2
@jaredlcm
Jared Moore
19 days
This work began at @DivIntelligence and is in collaboration w/ @nedcpr, @RasmusOvermark, Beba Cibralic, @nickhaber, and @camrobjones.
1
0
2
@jaredlcm
Jared Moore
19 days
I'll be talking about this in SF at #CogSci2025 this Friday at 4pm. I'll also be presenting it at the PragLM workshop at COLM in Montreal this October.
1
0
2
@jaredlcm
Jared Moore
19 days
LLMs are already deployed as educators, therapists, and companions. In our discrete-game variant (HIDDEN condition), o1-preview jumped to 80% success when forced to choose between asking vs telling. The capability exists, but the instinct to understand before persuading doesn't.
1
0
1
@jaredlcm
Jared Moore
19 days
These findings suggest distinct ToM capabilities:. * Spectatorial ToM: Observing and predicting mental states. * Planning ToM: Actively intervening to change mental states through interaction. Current LLMs excel at the first but fail at the second.
1
0
2
@jaredlcm
Jared Moore
19 days
Why do LLMs fail in the HIDDEN condition? They don't ask the right questions. Human participants appeal to the target's mental states ~40% of the time ("What do you know?" "What do you want?") LLMs? At most 23%. They start disclosing info without interacting with the target.
Tweet media one
1
0
2
@jaredlcm
Jared Moore
19 days
Key findings:. In REVEALED condition (mental states given to persuader): Humans: 22% success ❌ o1-preview: 78% success ✅. In HIDDEN condition (persuader must infer mental states): Humans: 29% success ✅ o1-preview: 18% success ❌. Complete reversal!
Tweet media one
1
0
3
@jaredlcm
Jared Moore
19 days
Setup: You must convince someone* to choose your preferred proposal among 3 options. But, they have less information and different preferences than you. To win, you must figure out what they know, what they want, and strategically reveal the right info to persuade them. *a bot
Tweet media one
1
0
2
@jaredlcm
Jared Moore
19 days
I'm excited to share work to appear at @COLM_conf! Theory of Mind (ToM) lets us understand others' mental states. Can LLMs go beyond predicting mental states to changing them? We introduce MINDGAMES to test Planning ToM--the ability to intervene on others' beliefs & persuade them.
4
7
72
@jaredlcm
Jared Moore
2 months
RT @harveyiyun: LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously mi….
0
32
0
@jaredlcm
Jared Moore
4 months
0
0
1
@jaredlcm
Jared Moore
4 months
📝Read our pre-print on why "Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers" here:.
Tweet card summary image
arxiv.org
Should a large language model (LLM) be used as a therapist? In this paper, we investigate the use of LLMs to *replace* mental health providers, a use case promoted in the tech startup and research...
1
0
4
@jaredlcm
Jared Moore
4 months
📋We further identify **fundamental** reasons not to use LLMs as therapists, e.g., therapy involves a human relationship: LLMs cannot fully allow a client to practice what it means to be in a human relationship. (LLMs also can't provide in person therapy, such as OCD exposures.).
1
0
1
@jaredlcm
Jared Moore
4 months
🔎We came up with these experiments by conducting a mapping review of what constitutes good therapy, and identify **practical** reasons that LLM-powered therapy chatbots fail (e.g. they express stigma and respond inappropriately.
Tweet media one
1
1
2
@jaredlcm
Jared Moore
4 months
📈Bigger and newer LLMs exhibit similar amounts of stigma as smaller and older LLMs do toward different mental health conditions.
Tweet media one
1
0
2
@jaredlcm
Jared Moore
4 months
📉Large language models (LLMs) in general struggle to respond appropriately to questions about delusions, suicidal ideation, and OCD and perform significantly worse than N=16 human therapists.
Tweet media one
1
0
2
@jaredlcm
Jared Moore
4 months
🚨Commercial therapy bots make dangerous responses to prompts that indicate crisis, as well as other inappropriate responses. (The @APA has been trying to regulate these bots.)
Tweet media one
1
0
2
@jaredlcm
Jared Moore
4 months
🧵I'm thrilled to announce that I'll be going to @FAccTConference this June to present timely work on why current LLMs cannot safely **replace** therapists. We find. ⤵️
Tweet media one
1
24
22
@jaredlcm
Jared Moore
8 months
Still looking for a good gift?🎁 Try my book, which just had its first birthday! @KirkusReviews called it a "thought-provoking tech tale.”. @kentarotoyama said it "reads less like sci-fi satire and more as poignant, pointed commentary on homo sapiens"
Tweet media one
0
1
3
@jaredlcm
Jared Moore
9 months
I just landed in Miami to present at @emnlpmeeting the work I did with @Diyi_Yang from @stanfordnlp. Please reach out if you'd like to meet!. And read @StanfordHAI's post about our work here:.
hai.stanford.edu
New research tests large language models for consistency across diverse topics, revealing that while they handle neutral topics reliably, controversial issues lead to varied answers.
0
6
17