
Center for Human-Compatible AI
@CHAI_Berkeley
Followers
4K
Following
225
Media
4
Statuses
201
CHAI is a multi-institute research organization based out of UC Berkeley that focuses on foundational research for AI technical safety.
Berkeley, CA
Joined November 2018
RT @Karim_abdelll: *New AI Alignment Paper*. 🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the….
0
31
0
RT @cassidy_laidlaw: We built an AI assistant that plays Minecraft with you. Start building a house—it figures out what you’re doing and ju….
0
216
0
RT @benplaut: (1/7) New paper with @khanhxuannguyen and @thetututrain! Do LLM output probabilities actually relate to the probability of co….
0
5
0
RT @a_lidayan: 🚨Our new #ICLR2025 paper presents a unified framework for intrinsic motivation and reward shaping: they signal the value of….
0
32
0
RT @MasonNaka: Preference learning typically requires large amounts of pairwise feedback to learn an adequate preference model. However, ca….
0
7
0
RT @benplaut: (1/5) New paper! Despite concerns about AI catastrophe, there isn’t much work on learning while provably avoiding catastrophe….
0
6
0
RT @cassidy_laidlaw: When RLHFed models engage in “reward hacking” it can lead to unsafe/unwanted behavior. But there isn’t a good formal d….
0
56
0
RT @feng_jiahai: LMs can generalize to implications of facts they are finetuned on. But what mechanisms enable this, and how are these mech….
0
22
0
RT @LukeBailey181: Can interpretability help defend LLMs? We find we can reshape activations while preserving a model’s behavior. This lets….
0
85
0
Want to help shape the future of safe AI? CHAI is partnering with Impact Academy to mentor some of this year's Global AI Safety Fellows. Applications are open now through Dec. 31. There's also a reward for referrals if you know someone who'd be a good fit!.
🔊Advance AI Safety Research & Development: Apply for Global AI Safety Fellowship 2025 🧵. 🌟What: The Fellowship is a 3-6 month fully-funded research program for exceptional STEM talent worldwide. (1/10). @aisafetyfellows
0
0
11
RT @MicahCarroll: 🚨 New paper: We find that even safety-tuned LLMs learn to manipulate vulnerable users when training them further with use….
0
77
0
RT @MicahCarroll: @CHAI_Berkeley applications for 2025 close in just over a day! ⏰‼️. Apply now! Details below:
0
13
0
RT @camall3n: RL in POMDPs is hard because you need memory. Remembering *everything* is expensive, and RNNs can only get you so far applied….
0
56
0
RT @MicahCarroll: Excited to share a unifying formalism for the main problem I’ve tackled since starting my PhD! 🎉. Current AI Alignment te….
0
45
0
RT @jenner_erik: ♟️Do chess-playing neural nets rely purely on simple heuristics? Or do they implement algorithms involving *look-ahead* in….
0
133
0
RT @shreyaskapur: My first PhD paper!🎉We learn *diffusion* models for code generation that learn to directly *edit* syntax trees of program….
0
598
0
RT @Michael05156007: Recent research justifies a concern that AI could escape our control and cause human extinction. Very advanced long-te….
0
61
0
RT @emmons_scott: When do RLHF policies appear aligned but misbehave in subtle ways?. Consider a terminal assistant that hides error messag….
0
24
0
RT @emmons_scott: Some jailbreaks *harm model intelligence*. In severe cases, they halve MMLU accuracy!. We study this and present the Stro….
0
20
0
RT @emmons_scott: Can explainability methods help predict behavior on new inputs?. Past studies test with crowd workers. We test with GPT-4….
0
16
0