Johannes Heidecke @JoHeidecke X Profile

Johannes Heidecke

@JoHeidecke

Followers

4K

Following

463

Media

3

Statuses

55

Safety Systems @ OpenAI

Joined March 2014

Don't wanna be here? Send us removal request.

Johannes Heidecke

@JoHeidecke

15 days

3/ Today, we are sharing more details on what we’re doing to mitigate this risk in our deployments, and some ideas for researchers, governments, and the world at large to accelerate our overall readiness.

5

7

177

Johannes Heidecke

@JoHeidecke

15 days

2/ This will enable and accelerate beneficial progress in biological research, but also - if unmitigated - comes with risks of providing meaningful assistance to novice actors with basic relevant training, enabling them to create biological threats.

7

8

166

Johannes Heidecke

@JoHeidecke

15 days

1/ Our models are becoming more capable in biology and we expect upcoming models to reach ‘High’ capability levels as defined by our Preparedness Framework. 🧵.

144

273

1K

Johannes Heidecke

@JoHeidecke

15 days

RT @OpenAI: Understanding and preventing misalignment generalization. Recent work has shown that a language model trained to produce insecu….

0

461

0

Johannes Heidecke

@JoHeidecke

15 days

some exciting results about understanding and steering (mis)alignment 🔥.

Miles Wang

@MilesKWang

15 days

We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more. We find that emergent misalignment:.- happens during reinforcement learning.- is controlled by “misaligned persona” features.- can be detected and mitigated. 🧵:

3

0

18

Johannes Heidecke

@JoHeidecke

28 days

RT @jachiam0: OpenAI is hiring across our Safety Research teams! If you're interested in working on frontier AI safety, fill out this quick….

0

24

0

Johannes Heidecke

@JoHeidecke

28 days

RT @caseychu9: Join us in making the next generation of agents both capable and safe! We think that agents will be a big part of how we int….

0

1

0

Johannes Heidecke

@JoHeidecke

28 days

This is a great write-up with important reflections. Emotional connections and reliance on AI are becoming an important part of our safety work at OpenAI. We’re studying these dynamics closely and will share more as we learn.

Joanne Jang

@joannejang

28 days

some thoughts on human-ai relationships and how we're approaching them at openai. it's a long blog post --. tl;dr we build models to serve people first. as more people feel increasingly connected to ai, we’re prioritizing research into how this impacts their emotional well-being.

4

2

26

Johannes Heidecke

@JoHeidecke

2 months

5/ You can see full results in the Safety Evaluation Hub we launched today. We plan to update the hub alongside major model releases to make it easier to follow our safety progress.

2

1

40

Johannes Heidecke

@JoHeidecke

2 months

4/ GPT-4.1 builds on the safety work and mitigations developed for GPT-4o. Across our standard safety evaluations, GPT-4.1 performs at parity with GPT-4o, showing that improvements can be delivered without introducing new safety risks.

2

46

Johannes Heidecke

@JoHeidecke

2 months

3/ While this is a notable improvement, GPT-4.1 doesn’t introduce new modalities or ways of interacting with the model, and doesn’t surpass o3 in intelligence. This means that the safety considerations here, while substantial, are different from frontier models.

1

42

Johannes Heidecke

@JoHeidecke

2 months

2/ Before launching GPT-4.1 in the API, we ran evaluations to test the model’s capabilities and safety. It excels at coding and instruction following - things that are extremely helpful for developers.

1

44

Johannes Heidecke

@JoHeidecke

2 months

1/ Safety is core to every model we build at OpenAI. As we deploy GPT-4.1 into ChatGPT, we want to share some insights from our safety work. 🧵.

48

52

445

Johannes Heidecke

@JoHeidecke

2 months

really excited about this release, right at the intersection of safety research & benefitting millions of real humans.

Karan Singhal

@thekaransinghal

2 months

📣 Proud to share HealthBench, an open-source benchmark from our Health AI team at OpenAI, measuring LLM performance and safety across 5000 realistic health conversations. 🧵. Unlike previous narrow benchmarks, HealthBench enables meaningful open-ended evaluation through 48,562

1

0

12

Johannes Heidecke

@JoHeidecke

3 months

RT @_lamaahmad: The system card shows not only the thought and care in advancing and prioritizing critical safety improvements, but also th….

0

10

0

Johannes Heidecke

@JoHeidecke

3 months

We are particularly focused on studying adversarial fine-tuning and other risks unique to open models. As with all model releases, we’re conducting extensive safety testing, both internally and with trusted third-party experts, prior to public release.

6

4

51

Johannes Heidecke

@JoHeidecke

3 months

Safety is a core focus of our open-weight model’s development, from pre-training to release. While open models bring unique challenges, we’re guided by our Preparedness Framework and will not release models we believe pose catastrophic risks.

Sam Altman

@sama

3 months

TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: we are excited to make this a very, very good model!. __. we are planning to.

56

97

613

Johannes Heidecke

@JoHeidecke

4 months

RT @woj_zaremba: “How We Think About Safety and Alignment” — this is our cornerstone document. Enjoy!.

0

18

0

Johannes Heidecke

@JoHeidecke

4 months

RT @OpenAI: Detecting misbehavior in frontier reasoning models. Chain-of-thought (CoT) reasoning models “think” in natural language underst….

0

746

0

Johannes Heidecke

@JoHeidecke

5 months

RT @boazbaraktcs: 1/5 Excited about our paper demonstrating that increasing inference-time compute improves robustness. I think moving from….

0

28

0