JoHeidecke Profile Banner
Johannes Heidecke Profile
Johannes Heidecke

@JoHeidecke

Followers
4K
Following
463
Media
3
Statuses
55

Safety Systems @ OpenAI

Joined March 2014
Don't wanna be here? Send us removal request.
@JoHeidecke
Johannes Heidecke
15 days
3/ Today, we are sharing more details on what we’re doing to mitigate this risk in our deployments, and some ideas for researchers, governments, and the world at large to accelerate our overall readiness.
5
7
177
@JoHeidecke
Johannes Heidecke
15 days
2/ This will enable and accelerate beneficial progress in biological research, but also - if unmitigated - comes with risks of providing meaningful assistance to novice actors with basic relevant training, enabling them to create biological threats.
7
8
166
@JoHeidecke
Johannes Heidecke
15 days
1/ Our models are becoming more capable in biology and we expect upcoming models to reach ‘High’ capability levels as defined by our Preparedness Framework. 🧵.
144
273
1K
@JoHeidecke
Johannes Heidecke
15 days
RT @OpenAI: Understanding and preventing misalignment generalization. Recent work has shown that a language model trained to produce insecu….
0
461
0
@JoHeidecke
Johannes Heidecke
15 days
some exciting results about understanding and steering (mis)alignment 🔥.
@MilesKWang
Miles Wang
15 days
We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more. We find that emergent misalignment:.- happens during reinforcement learning.- is controlled by “misaligned persona” features.- can be detected and mitigated. 🧵:
Tweet media one
3
0
18
@JoHeidecke
Johannes Heidecke
28 days
RT @jachiam0: OpenAI is hiring across our Safety Research teams! If you're interested in working on frontier AI safety, fill out this quick….
0
24
0
@JoHeidecke
Johannes Heidecke
28 days
RT @caseychu9: Join us in making the next generation of agents both capable and safe! We think that agents will be a big part of how we int….
0
1
0
@JoHeidecke
Johannes Heidecke
28 days
This is a great write-up with important reflections. Emotional connections and reliance on AI are becoming an important part of our safety work at OpenAI. We’re studying these dynamics closely and will share more as we learn.
@joannejang
Joanne Jang
28 days
some thoughts on human-ai relationships and how we're approaching them at openai. it's a long blog post --. tl;dr we build models to serve people first. as more people feel increasingly connected to ai, we’re prioritizing research into how this impacts their emotional well-being.
Tweet media one
4
2
26
@JoHeidecke
Johannes Heidecke
2 months
5/ You can see full results in the Safety Evaluation Hub we launched today. We plan to update the hub alongside major model releases to make it easier to follow our safety progress.
2
1
40
@JoHeidecke
Johannes Heidecke
2 months
4/ GPT-4.1 builds on the safety work and mitigations developed for GPT-4o. Across our standard safety evaluations, GPT-4.1 performs at parity with GPT-4o, showing that improvements can be delivered without introducing new safety risks.
2
2
46
@JoHeidecke
Johannes Heidecke
2 months
3/ While this is a notable improvement, GPT-4.1 doesn’t introduce new modalities or ways of interacting with the model, and doesn’t surpass o3 in intelligence. This means that the safety considerations here, while substantial, are different from frontier models.
1
1
42
@JoHeidecke
Johannes Heidecke
2 months
2/ Before launching GPT-4.1 in the API, we ran evaluations to test the model’s capabilities and safety. It excels at coding and instruction following - things that are extremely helpful for developers.
1
1
44
@JoHeidecke
Johannes Heidecke
2 months
1/ Safety is core to every model we build at OpenAI. As we deploy GPT-4.1 into ChatGPT, we want to share some insights from our safety work. 🧵.
48
52
445
@JoHeidecke
Johannes Heidecke
2 months
really excited about this release, right at the intersection of safety research & benefitting millions of real humans.
@thekaransinghal
Karan Singhal
2 months
📣 Proud to share HealthBench, an open-source benchmark from our Health AI team at OpenAI, measuring LLM performance and safety across 5000 realistic health conversations. 🧵. Unlike previous narrow benchmarks, HealthBench enables meaningful open-ended evaluation through 48,562
Tweet media one
1
0
12
@JoHeidecke
Johannes Heidecke
3 months
RT @_lamaahmad: The system card shows not only the thought and care in advancing and prioritizing critical safety improvements, but also th….
0
10
0
@JoHeidecke
Johannes Heidecke
3 months
We are particularly focused on studying adversarial fine-tuning and other risks unique to open models. As with all model releases, we’re conducting extensive safety testing, both internally and with trusted third-party experts, prior to public release.
6
4
51
@JoHeidecke
Johannes Heidecke
3 months
Safety is a core focus of our open-weight model’s development, from pre-training to release. While open models bring unique challenges, we’re guided by our Preparedness Framework and will not release models we believe pose catastrophic risks.
@sama
Sam Altman
3 months
TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: we are excited to make this a very, very good model!. __. we are planning to.
56
97
613
@JoHeidecke
Johannes Heidecke
4 months
RT @woj_zaremba: “How We Think About Safety and Alignment” — this is our cornerstone document. Enjoy!.
0
18
0
@JoHeidecke
Johannes Heidecke
4 months
RT @OpenAI: Detecting misbehavior in frontier reasoning models. Chain-of-thought (CoT) reasoning models “think” in natural language underst….
0
746
0
@JoHeidecke
Johannes Heidecke
5 months
RT @boazbaraktcs: 1/5 Excited about our paper demonstrating that increasing inference-time compute improves robustness. I think moving from….
0
28
0