Center for Human-Compatible AI @CHAI_Berkeley X Profile

Center for Human-Compatible AI

@CHAI_Berkeley

Followers

4K

Following

225

Media

4

Statuses

201

CHAI is a multi-institute research organization based out of UC Berkeley that focuses on foundational research for AI technical safety.

Berkeley, CA

Joined November 2018

Don't wanna be here? Send us removal request.

Center for Human-Compatible AI

@CHAI_Berkeley

2 months

RT @Karim_abdelll: *New AI Alignment Paper*. 🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the….

0

31

0

Center for Human-Compatible AI

@CHAI_Berkeley

5 months

RT @cassidy_laidlaw: We built an AI assistant that plays Minecraft with you. Start building a house—it figures out what you’re doing and ju….

0

216

0

Grok

@grok

5 days

Join millions who have switched to Grok.

218

430

4K

Center for Human-Compatible AI

@CHAI_Berkeley

5 months

RT @benplaut: (1/7) New paper with @khanhxuannguyen and @thetututrain! Do LLM output probabilities actually relate to the probability of co….

0

5

0

Center for Human-Compatible AI

@CHAI_Berkeley

5 months

RT @a_lidayan: 🚨Our new #ICLR2025 paper presents a unified framework for intrinsic motivation and reward shaping: they signal the value of….

0

32

0

Center for Human-Compatible AI

@CHAI_Berkeley

6 months

RT @MasonNaka: Preference learning typically requires large amounts of pairwise feedback to learn an adequate preference model. However, ca….

0

7

0

Center for Human-Compatible AI

@CHAI_Berkeley

7 months

RT @benplaut: (1/5) New paper! Despite concerns about AI catastrophe, there isn’t much work on learning while provably avoiding catastrophe….

0

6

0

Center for Human-Compatible AI

@CHAI_Berkeley

9 months

RT @cassidy_laidlaw: When RLHFed models engage in “reward hacking” it can lead to unsafe/unwanted behavior. But there isn’t a good formal d….

0

56

0

Center for Human-Compatible AI

@CHAI_Berkeley

9 months

RT @feng_jiahai: LMs can generalize to implications of facts they are finetuned on. But what mechanisms enable this, and how are these mech….

0

22

0

Center for Human-Compatible AI

@CHAI_Berkeley

9 months

RT @LukeBailey181: Can interpretability help defend LLMs? We find we can reshape activations while preserving a model’s behavior. This lets….

0

85

0

Center for Human-Compatible AI

@CHAI_Berkeley

9 months

Want to help shape the future of safe AI? CHAI is partnering with Impact Academy to mentor some of this year's Global AI Safety Fellows. Applications are open now through Dec. 31. There's also a reward for referrals if you know someone who'd be a good fit!.

Impact Academy

@aisafetyfellows

10 months

🔊Advance AI Safety Research & Development: Apply for Global AI Safety Fellowship 2025 🧵. 🌟What: The Fellowship is a 3-6 month fully-funded research program for exceptional STEM talent worldwide. (1/10). @aisafetyfellows

0

11

Center for Human-Compatible AI

@CHAI_Berkeley

10 months

RT @MicahCarroll: 🚨 New paper: We find that even safety-tuned LLMs learn to manipulate vulnerable users when training them further with use….

0

77

0

Center for Human-Compatible AI

@CHAI_Berkeley

11 months

RT @MicahCarroll: @CHAI_Berkeley applications for 2025 close in just over a day! ⏰‼️. Apply now! Details below:

0

13

0

Center for Human-Compatible AI

@CHAI_Berkeley

1 year

RT @camall3n: RL in POMDPs is hard because you need memory. Remembering *everything* is expensive, and RNNs can only get you so far applied….

0

56

0

Center for Human-Compatible AI

@CHAI_Berkeley

1 year

RT @MicahCarroll: Excited to share a unifying formalism for the main problem I’ve tackled since starting my PhD! 🎉. Current AI Alignment te….

0

45

0

Center for Human-Compatible AI

@CHAI_Berkeley

1 year

RT @jenner_erik: ♟️Do chess-playing neural nets rely purely on simple heuristics? Or do they implement algorithms involving *look-ahead* in….

0

133

0

Center for Human-Compatible AI

@CHAI_Berkeley

1 year

RT @shreyaskapur: My first PhD paper!🎉We learn *diffusion* models for code generation that learn to directly *edit* syntax trees of program….

0

598

0

Center for Human-Compatible AI

@CHAI_Berkeley

1 year

RT @Michael05156007: Recent research justifies a concern that AI could escape our control and cause human extinction. Very advanced long-te….

0

61

0

Center for Human-Compatible AI

@CHAI_Berkeley

2 years

RT @emmons_scott: When do RLHF policies appear aligned but misbehave in subtle ways?. Consider a terminal assistant that hides error messag….

0

24

0

Center for Human-Compatible AI

@CHAI_Berkeley

2 years

RT @emmons_scott: Some jailbreaks *harm model intelligence*. In severe cases, they halve MMLU accuracy!. We study this and present the Stro….

0

20

0

Center for Human-Compatible AI

@CHAI_Berkeley

2 years

RT @emmons_scott: Can explainability methods help predict behavior on new inputs?. Past studies test with crowd workers. We test with GPT-4….

0

16

0