CHAI_Berkeley Profile Banner
Center for Human-Compatible AI Profile
Center for Human-Compatible AI

@CHAI_Berkeley

Followers
4K
Following
225
Media
4
Statuses
201

CHAI is a multi-institute research organization based out of UC Berkeley that focuses on foundational research for AI technical safety.

Berkeley, CA
Joined November 2018
Don't wanna be here? Send us removal request.
@CHAI_Berkeley
Center for Human-Compatible AI
2 months
RT @Karim_abdelll: *New AI Alignment Paper*. 🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the….
0
31
0
@CHAI_Berkeley
Center for Human-Compatible AI
5 months
RT @cassidy_laidlaw: We built an AI assistant that plays Minecraft with you. Start building a house—it figures out what you’re doing and ju….
0
216
0
@grok
Grok
5 days
Join millions who have switched to Grok.
218
430
4K
@CHAI_Berkeley
Center for Human-Compatible AI
5 months
RT @benplaut: (1/7) New paper with @khanhxuannguyen and @thetututrain! Do LLM output probabilities actually relate to the probability of co….
0
5
0
@CHAI_Berkeley
Center for Human-Compatible AI
5 months
RT @a_lidayan: 🚨Our new #ICLR2025 paper presents a unified framework for intrinsic motivation and reward shaping: they signal the value of….
0
32
0
@CHAI_Berkeley
Center for Human-Compatible AI
6 months
RT @MasonNaka: Preference learning typically requires large amounts of pairwise feedback to learn an adequate preference model. However, ca….
0
7
0
@CHAI_Berkeley
Center for Human-Compatible AI
7 months
RT @benplaut: (1/5) New paper! Despite concerns about AI catastrophe, there isn’t much work on learning while provably avoiding catastrophe….
0
6
0
@CHAI_Berkeley
Center for Human-Compatible AI
9 months
RT @cassidy_laidlaw: When RLHFed models engage in “reward hacking” it can lead to unsafe/unwanted behavior. But there isn’t a good formal d….
0
56
0
@CHAI_Berkeley
Center for Human-Compatible AI
9 months
RT @feng_jiahai: LMs can generalize to implications of facts they are finetuned on. But what mechanisms enable this, and how are these mech….
0
22
0
@CHAI_Berkeley
Center for Human-Compatible AI
9 months
RT @LukeBailey181: Can interpretability help defend LLMs? We find we can reshape activations while preserving a model’s behavior. This lets….
0
85
0
@CHAI_Berkeley
Center for Human-Compatible AI
9 months
Want to help shape the future of safe AI? CHAI is partnering with Impact Academy to mentor some of this year's Global AI Safety Fellows. Applications are open now through Dec. 31. There's also a reward for referrals if you know someone who'd be a good fit!.
@aisafetyfellows
Impact Academy
10 months
🔊Advance AI Safety Research & Development: Apply for Global AI Safety Fellowship 2025 🧵. 🌟What: The Fellowship is a 3-6 month fully-funded research program for exceptional STEM talent worldwide. (1/10). @aisafetyfellows
Tweet media one
0
0
11
@CHAI_Berkeley
Center for Human-Compatible AI
10 months
RT @MicahCarroll: 🚨 New paper: We find that even safety-tuned LLMs learn to manipulate vulnerable users when training them further with use….
0
77
0
@CHAI_Berkeley
Center for Human-Compatible AI
11 months
RT @MicahCarroll: @CHAI_Berkeley applications for 2025 close in just over a day! ⏰‼️. Apply now! Details below:
Tweet media one
0
13
0
@CHAI_Berkeley
Center for Human-Compatible AI
1 year
RT @camall3n: RL in POMDPs is hard because you need memory. Remembering *everything* is expensive, and RNNs can only get you so far applied….
0
56
0
@CHAI_Berkeley
Center for Human-Compatible AI
1 year
RT @MicahCarroll: Excited to share a unifying formalism for the main problem I’ve tackled since starting my PhD! 🎉. Current AI Alignment te….
0
45
0
@CHAI_Berkeley
Center for Human-Compatible AI
1 year
RT @jenner_erik: ♟️Do chess-playing neural nets rely purely on simple heuristics? Or do they implement algorithms involving *look-ahead* in….
0
133
0
@CHAI_Berkeley
Center for Human-Compatible AI
1 year
RT @shreyaskapur: My first PhD paper!🎉We learn *diffusion* models for code generation that learn to directly *edit* syntax trees of program….
0
598
0
@CHAI_Berkeley
Center for Human-Compatible AI
1 year
RT @Michael05156007: Recent research justifies a concern that AI could escape our control and cause human extinction. Very advanced long-te….
0
61
0
@CHAI_Berkeley
Center for Human-Compatible AI
2 years
RT @emmons_scott: When do RLHF policies appear aligned but misbehave in subtle ways?. Consider a terminal assistant that hides error messag….
0
24
0
@CHAI_Berkeley
Center for Human-Compatible AI
2 years
RT @emmons_scott: Some jailbreaks *harm model intelligence*. In severe cases, they halve MMLU accuracy!. We study this and present the Stro….
0
20
0
@CHAI_Berkeley
Center for Human-Compatible AI
2 years
RT @emmons_scott: Can explainability methods help predict behavior on new inputs?. Past studies test with crowd workers. We test with GPT-4….
0
16
0