jkcarlsmith Profile Banner
Joe Carlsmith Profile
Joe Carlsmith

@jkcarlsmith

Followers
7K
Following
749
Media
69
Statuses
368

Philosophy, futurism, AI. Senior advisor @open_phil. Opinions my own.

Berkeley, CA
Joined April 2013
Don't wanna be here? Send us removal request.
@jkcarlsmith
Joe Carlsmith
10 days
Step 1: human alignment researchers do X. Step 2: try to get AIs to do X. (More detail here: .
Tweet card summary image
joecarlsmith.com
It's really important; we have a real shot; there are a lot of ways we can fail.
@AnthropicAI
Anthropic
10 days
New Anthropic research: Building and evaluating alignment auditing agents. We developed three AI agents to autonomously complete alignment auditing tasks. In testing, our agents successfully uncovered hidden goals, built safety evaluations, and surfaced concerning behaviors.
Tweet media one
0
0
17
@jkcarlsmith
Joe Carlsmith
16 days
In response to a comment from @herbiebradley on my recent talk, I wrote a bit about my backdrop model of the long-term role of human labor in a post-AGI economy.
@jkcarlsmith
Joe Carlsmith
17 days
@herbiebradley I haven't written about this much or thought it through in detail, but here are a few aspects that go into my backdrop model: . (1) especially in the long-term technological limit, I expect human labor to be wildly uncompetitive for basically any task relative to what advanced.
1
0
14
@jkcarlsmith
Joe Carlsmith
17 days
RT @michael_nielsen: Thoughtful discussion of "Can Goodness Compete [with power]?" by @jkcarlsmith (link in next post). It's a really funda….
0
6
0
@jkcarlsmith
Joe Carlsmith
17 days
Core concern the talk aims to unpack:
Tweet media one
0
1
7
@jkcarlsmith
Joe Carlsmith
17 days
YouTube version:
1
0
5
@jkcarlsmith
Joe Carlsmith
17 days
Transcript:
1
0
2
@jkcarlsmith
Joe Carlsmith
17 days
I recently gave a public talk called “Can goodness compete?”, on long-term equilibria post-AGI. Video here and on YouTube, link to transcript and slides in thread.
@jkcarlsmith
Joe Carlsmith
27 days
I'm giving a public talk Tuesday July 8th, 7:30 pm at Mox in SF. Title: "Can goodness compete?". It's about long-term equilibrium outcomes post-AGI. More info at link in thread.
5
12
130
@jkcarlsmith
Joe Carlsmith
27 days
I'm also aiming to make a recording of some version of the talk publicly available (might be the Vancouver version).
@JustinBullock14
Justin Bullock
27 days
@jkcarlsmith Any chance it will be recorded and made available?.
2
0
22
@jkcarlsmith
Joe Carlsmith
27 days
This is a longer version of the talk I'm giving at this workshop in Vancouver next week:
@DavidDuvenaud
David Duvenaud
2 months
It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop!. Post-AGI Civilizational Equilibria: Are there any good ones?. Vancouver, July 14th. Featuring: @jkcarlsmith @RichardMCNgo @eshear 🧵
Tweet media one
2
0
15
@jkcarlsmith
Joe Carlsmith
27 days
I'm giving a public talk Tuesday July 8th, 7:30 pm at Mox in SF. Title: "Can goodness compete?". It's about long-term equilibrium outcomes post-AGI. More info at link in thread.
4
9
112
@jkcarlsmith
Joe Carlsmith
2 months
RT @DavidDuvenaud: It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop!….
0
32
0
@jkcarlsmith
Joe Carlsmith
2 months
RT @zdgroff: 💡Leading researchers and AI companies have raised the possibility that AI models could soon be sentient. I’m worried that to….
0
27
0
@jkcarlsmith
Joe Carlsmith
2 months
To my knowledge, this is the most serious industry-led attempt to investigate the welfare of a frontier AI system in human history. Kudos to Anthropic for leading the way.
@fish_kyle3
Kyle Fish
2 months
🧵For Claude Opus 4, we ran our first pre-launch model welfare assessment. To be clear, we don’t know if Claude has welfare. Or what welfare even is, exactly? 🫠 But, we think this could be important, so we gave it a go. And things got pretty wild….
2
1
83
@jkcarlsmith
Joe Carlsmith
2 months
And I close with a brief discussion of what AI labs like Anthropic can do.
Tweet media one
1
0
5
@jkcarlsmith
Joe Carlsmith
2 months
I also talk about various arguments for the possibility of moral status without consciousness (see slide).
Tweet media one
1
0
2
@jkcarlsmith
Joe Carlsmith
2 months
I expect AIs to have many of these for roughly the same high-level reasons we do: namely, that they're useful. So a key question is whether AI minds would accomplish these same useful functions in a way that involves consciousness. I think it's unclear, but plausible.
1
0
1
@jkcarlsmith
Joe Carlsmith
2 months
But of course, AIs are different: produced via a less evolution-like process, already exposed to our discourse about consciousness, trained to give specific takes on consciousness, etc. So we have to focus more on their consciousness-associated capacities.
1
0
1