Joe Carlsmith @jkcarlsmith X Profile

Joe Carlsmith

@jkcarlsmith

Followers

7K

Following

749

Media

69

Statuses

368

Philosophy, futurism, AI. Senior advisor @open_phil. Opinions my own.

Berkeley, CA

Joined April 2013

Don't wanna be here? Send us removal request.

Joe Carlsmith

@jkcarlsmith

3 years

I put my report on existential risk from power-seeking AI on arXiv:

arxiv.org

This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that...

8

14

143

Joe Carlsmith

@jkcarlsmith

10 days

Step 1: human alignment researchers do X. Step 2: try to get AIs to do X. (More detail here: .

joecarlsmith.com

It's really important; we have a real shot; there are a lot of ways we can fail.

Anthropic

@AnthropicAI

10 days

New Anthropic research: Building and evaluating alignment auditing agents. We developed three AI agents to autonomously complete alignment auditing tasks. In testing, our agents successfully uncovered hidden goals, built safety evaluations, and surfaced concerning behaviors.

0

17

Joe Carlsmith

@jkcarlsmith

16 days

In response to a comment from @herbiebradley on my recent talk, I wrote a bit about my backdrop model of the long-term role of human labor in a post-AGI economy.

Joe Carlsmith

@jkcarlsmith

17 days

@herbiebradley I haven't written about this much or thought it through in detail, but here are a few aspects that go into my backdrop model: . (1) especially in the long-term technological limit, I expect human labor to be wildly uncompetitive for basically any task relative to what advanced.

1

0

14

Joe Carlsmith

@jkcarlsmith

17 days

RT @michael_nielsen: Thoughtful discussion of "Can Goodness Compete [with power]?" by @jkcarlsmith (link in next post). It's a really funda….

0

6

0

Joe Carlsmith

@jkcarlsmith

17 days

Core concern the talk aims to unpack:

0

1

7

Joe Carlsmith

@jkcarlsmith

17 days

YouTube version:

1

0

5

Joe Carlsmith

@jkcarlsmith

17 days

Slides:

docs.google.com

Can goodness compete? Joe Carlsmith Talk at Mox, July 2025

1

0

3

Joe Carlsmith

@jkcarlsmith

17 days

Transcript:

1

0

2

Joe Carlsmith

@jkcarlsmith

17 days

I recently gave a public talk called “Can goodness compete?”, on long-term equilibria post-AGI. Video here and on YouTube, link to transcript and slides in thread.

Joe Carlsmith

@jkcarlsmith

27 days

I'm giving a public talk Tuesday July 8th, 7:30 pm at Mox in SF. Title: "Can goodness compete?". It's about long-term equilibrium outcomes post-AGI. More info at link in thread.

5

12

130

Joe Carlsmith

@jkcarlsmith

27 days

I'm also aiming to make a recording of some version of the talk publicly available (might be the Vancouver version).

Justin Bullock

@JustinBullock14

27 days

@jkcarlsmith Any chance it will be recorded and made available?.

2

0

22

Joe Carlsmith

@jkcarlsmith

27 days

This is a longer version of the talk I'm giving at this workshop in Vancouver next week:

David Duvenaud

@DavidDuvenaud

2 months

It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop!. Post-AGI Civilizational Equilibria: Are there any good ones?. Vancouver, July 14th. Featuring: @jkcarlsmith @RichardMCNgo @eshear 🧵

2

0

15

Joe Carlsmith

@jkcarlsmith

27 days

Public invite: Thanks to @pibbssai for organizing.

partiful.com

Taco Tuesday: open invite as always! This week, we're yet again at Mox, and we're inviting you to join us for a special talk arranged by PIBBSS (pibbss.ai): Our featured guest, Joe Carlsmith, will be...

1

0

11

Joe Carlsmith

@jkcarlsmith

27 days

I'm giving a public talk Tuesday July 8th, 7:30 pm at Mox in SF. Title: "Can goodness compete?". It's about long-term equilibrium outcomes post-AGI. More info at link in thread.

4

9

112

Joe Carlsmith

@jkcarlsmith

2 months

RT @DavidDuvenaud: It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop!….

0

32

0

Joe Carlsmith

@jkcarlsmith

2 months

RT @zdgroff: 💡Leading researchers and AI companies have raised the possibility that AI models could soon be sentient. I’m worried that to….

0

27

0

Joe Carlsmith

@jkcarlsmith

2 months

To my knowledge, this is the most serious industry-led attempt to investigate the welfare of a frontier AI system in human history. Kudos to Anthropic for leading the way.

Kyle Fish

@fish_kyle3

2 months

🧵For Claude Opus 4, we ran our first pre-launch model welfare assessment. To be clear, we don’t know if Claude has welfare. Or what welfare even is, exactly? 🫠 But, we think this could be important, so we gave it a go. And things got pretty wild….

2

1

83

Joe Carlsmith

@jkcarlsmith

2 months

Slides for the talk here:

docs.google.com

How should we think about AI welfare? Joe Carlsmith Talk at Anthropic, May 2025

0

3

Joe Carlsmith

@jkcarlsmith

2 months

And I close with a brief discussion of what AI labs like Anthropic can do.

1

0

5

Joe Carlsmith

@jkcarlsmith

2 months

I also talk about various arguments for the possibility of moral status without consciousness (see slide).

1

0

2

Joe Carlsmith

@jkcarlsmith

2 months

I expect AIs to have many of these for roughly the same high-level reasons we do: namely, that they're useful. So a key question is whether AI minds would accomplish these same useful functions in a way that involves consciousness. I think it's unclear, but plausible.

1

0

1

Joe Carlsmith

@jkcarlsmith

2 months

But of course, AIs are different: produced via a less evolution-like process, already exposed to our discourse about consciousness, trained to give specific takes on consciousness, etc. So we have to focus more on their consciousness-associated capacities.

1

0

1