
Johannes Heidecke
@JoHeidecke
Followers
4K
Following
463
Media
3
Statuses
55
RT @OpenAI: Understanding and preventing misalignment generalization. Recent work has shown that a language model trained to produce insecu….
0
461
0
some exciting results about understanding and steering (mis)alignment 🔥.
We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more. We find that emergent misalignment:.- happens during reinforcement learning.- is controlled by “misaligned persona” features.- can be detected and mitigated. 🧵:
3
0
18
RT @jachiam0: OpenAI is hiring across our Safety Research teams! If you're interested in working on frontier AI safety, fill out this quick….
0
24
0
RT @caseychu9: Join us in making the next generation of agents both capable and safe! We think that agents will be a big part of how we int….
0
1
0
This is a great write-up with important reflections. Emotional connections and reliance on AI are becoming an important part of our safety work at OpenAI. We’re studying these dynamics closely and will share more as we learn.
some thoughts on human-ai relationships and how we're approaching them at openai. it's a long blog post --. tl;dr we build models to serve people first. as more people feel increasingly connected to ai, we’re prioritizing research into how this impacts their emotional well-being.
4
2
26
really excited about this release, right at the intersection of safety research & benefitting millions of real humans.
📣 Proud to share HealthBench, an open-source benchmark from our Health AI team at OpenAI, measuring LLM performance and safety across 5000 realistic health conversations. 🧵. Unlike previous narrow benchmarks, HealthBench enables meaningful open-ended evaluation through 48,562
1
0
12
RT @_lamaahmad: The system card shows not only the thought and care in advancing and prioritizing critical safety improvements, but also th….
0
10
0
Safety is a core focus of our open-weight model’s development, from pre-training to release. While open models bring unique challenges, we’re guided by our Preparedness Framework and will not release models we believe pose catastrophic risks.
TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: we are excited to make this a very, very good model!. __. we are planning to.
56
97
613
RT @woj_zaremba: “How We Think About Safety and Alignment” — this is our cornerstone document. Enjoy!.
0
18
0
RT @OpenAI: Detecting misbehavior in frontier reasoning models. Chain-of-thought (CoT) reasoning models “think” in natural language underst….
0
746
0
RT @boazbaraktcs: 1/5 Excited about our paper demonstrating that increasing inference-time compute improves robustness. I think moving from….
0
28
0