Jared Moore @jaredlcm X Profile

Jared Moore

@jaredlcm

Followers

222

Following

120

Media

41

Statuses

88

@jaredlcm.bsky.social AI Researcher, Writer Stanford

Joined June 2017

Don't wanna be here? Send us removal request.

Jared Moore

@jaredlcm

19 days

Our conclusion: "LLMs’ apparent ToM abilities may be fundamentally different from humans' and might not extend to complex interactive tasks like planning.". Preprint: Code: Demo: /end 🧵.

github.com

Contribute to jlcmoore/mindgames development by creating an account on GitHub.

1

0

2

Jared Moore

@jaredlcm

19 days

This work began at @DivIntelligence and is in collaboration w/ @nedcpr, @RasmusOvermark, Beba Cibralic, @nickhaber, and @camrobjones.

1

0

2

Jared Moore

@jaredlcm

19 days

I'll be talking about this in SF at #CogSci2025 this Friday at 4pm. I'll also be presenting it at the PragLM workshop at COLM in Montreal this October.

1

0

2

Jared Moore

@jaredlcm

19 days

LLMs are already deployed as educators, therapists, and companions. In our discrete-game variant (HIDDEN condition), o1-preview jumped to 80% success when forced to choose between asking vs telling. The capability exists, but the instinct to understand before persuading doesn't.

1

0

1

Jared Moore

@jaredlcm

19 days

These findings suggest distinct ToM capabilities:. * Spectatorial ToM: Observing and predicting mental states. * Planning ToM: Actively intervening to change mental states through interaction. Current LLMs excel at the first but fail at the second.

1

0

2

Jared Moore

@jaredlcm

19 days

Why do LLMs fail in the HIDDEN condition? They don't ask the right questions. Human participants appeal to the target's mental states ~40% of the time ("What do you know?" "What do you want?") LLMs? At most 23%. They start disclosing info without interacting with the target.

1

0

2

Jared Moore

@jaredlcm

19 days

Key findings:. In REVEALED condition (mental states given to persuader): Humans: 22% success ❌ o1-preview: 78% success ✅. In HIDDEN condition (persuader must infer mental states): Humans: 29% success ✅ o1-preview: 18% success ❌. Complete reversal!

1

0

3

Jared Moore

@jaredlcm

19 days

Setup: You must convince someone* to choose your preferred proposal among 3 options. But, they have less information and different preferences than you. To win, you must figure out what they know, what they want, and strategically reveal the right info to persuade them. *a bot

1

0

2

Jared Moore

@jaredlcm

19 days

I'm excited to share work to appear at @COLM_conf! Theory of Mind (ToM) lets us understand others' mental states. Can LLMs go beyond predicting mental states to changing them? We introduce MINDGAMES to test Planning ToM--the ability to intervene on others' beliefs & persuade them.

4

7

72

Jared Moore

@jaredlcm

2 months

RT @harveyiyun: LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously mi….

0

32

0

Jared Moore

@jaredlcm

4 months

This is work done with. @declangrabbmd.@kevin_klyman.@willie_agnew.@snchancellor.@nickhaber.@_desmond_ong . Thanks ❤️.

0

1

Jared Moore

@jaredlcm

4 months

📝Read our pre-print on why "Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers" here:.

arxiv.org

Should a large language model (LLM) be used as a therapist? In this paper, we investigate the use of LLMs to *replace* mental health providers, a use case promoted in the tech startup and research...

1

0

4

Jared Moore

@jaredlcm

4 months

📋We further identify **fundamental** reasons not to use LLMs as therapists, e.g., therapy involves a human relationship: LLMs cannot fully allow a client to practice what it means to be in a human relationship. (LLMs also can't provide in person therapy, such as OCD exposures.).

1

0

1

Jared Moore

@jaredlcm

4 months

🔎We came up with these experiments by conducting a mapping review of what constitutes good therapy, and identify **practical** reasons that LLM-powered therapy chatbots fail (e.g. they express stigma and respond inappropriately.

1

2

Jared Moore

@jaredlcm

4 months

📈Bigger and newer LLMs exhibit similar amounts of stigma as smaller and older LLMs do toward different mental health conditions.

1

0

2

Jared Moore

@jaredlcm

4 months

📉Large language models (LLMs) in general struggle to respond appropriately to questions about delusions, suicidal ideation, and OCD and perform significantly worse than N=16 human therapists.

1

0

2

Jared Moore

@jaredlcm

4 months

🚨Commercial therapy bots make dangerous responses to prompts that indicate crisis, as well as other inappropriate responses. (The @APA has been trying to regulate these bots.)

1

0

2

Jared Moore

@jaredlcm

4 months

🧵I'm thrilled to announce that I'll be going to @FAccTConference this June to present timely work on why current LLMs cannot safely **replace** therapists. We find. ⤵️

1

24

22

Jared Moore

@jaredlcm

8 months

Still looking for a good gift?🎁 Try my book, which just had its first birthday! @KirkusReviews called it a "thought-provoking tech tale.”. @kentarotoyama said it "reads less like sci-fi satire and more as poignant, pointed commentary on homo sapiens"

0

1

3

Jared Moore

@jaredlcm

9 months

I just landed in Miami to present at @emnlpmeeting the work I did with @Diyi_Yang from @stanfordnlp. Please reach out if you'd like to meet!. And read @StanfordHAI's post about our work here:.

hai.stanford.edu

New research tests large language models for consistency across diverse topics, revealing that while they handle neutral topics reliably, controversial issues lead to varied answers.

0

6

17