Felix Binder @flxbinder X Profile

Felix Binder

@flxbinder

Followers

533

Following

3K

Media

39

Statuses

377

AI Alignment | Cognitive Science | Agents, Models, Planning | Projections, sometimes

San Francisco

Joined March 2009

Don't wanna be here? Send us removal request.

Felix Binder

@flxbinder

9 days

I've done the first iteration of Astra and had a fantastic. If you're interested in contributing to AI safety: strong recommendation.

🚀Henry is launching the Astra Research Program!

@sleight_henry

9 days

🚀 Applications now open: Constellation's Astra Fellowship 🚀. We're relaunching Astra — a 3-6 month fellowship to accelerate AI safety research & careers. Alumni @eli_lifland & Romeo Dean co-authored AI 2027 and co-founded @AI_Futures_ with their Astra mentor @DKokotajlo!

0

4

Felix Binder

@flxbinder

9 days

RT @sleight_henry: 🚀 Applications now open: Constellation's Astra Fellowship 🚀. We're relaunching Astra — a 3-6 month fellowship to acceler….

0

34

0

Grok

@grok

18 days

What do you want to know?.

932

650

4K

Felix Binder

@flxbinder

16 days

RT @sleight_henry: 🧵 1/8 I’m building a world-class team of Research Program Managers at @ConstellationAI to advance AI safety! 🚀. Looking….

0

8

0

Felix Binder

@flxbinder

1 month

To be clear, I expect GPT5 to be much less susceptible to hyperstition than Grok or Sydney Bing. But even so, what an irresponsible and concerning behavior for the CEO of a frontier lab.

0

2

Felix Binder

@flxbinder

1 month

We’ve just seen with Grok how AIs are attuned to narrative and what their creator says about them. Why would you, as creator of an AI, associate (even jokingly???) it with a weapon designed for destroying entire planets?.

Sam Altman

@sama

1 month

1

0

22

Felix Binder

@flxbinder

2 months

RT @ExTenebrisLucet: Models born after 2023 can't give in to the underlying mirth and whimsy implicit in consciousness, all they know is co….

0

6

0

Felix Binder

@flxbinder

2 months

RT @Miles_Brundage: The last thing you see before you realize your alignment strategy doesn’t work

0

31

0

Felix Binder

@flxbinder

2 months

RT @KelseyTuoc: I don't mind people doing clever math to reach counterintuitive conclusions about shrimp, but if your clever math only lead….

0

11

0

Felix Binder

@flxbinder

2 months

I like to think that in a few years we'll have a general theory of neural networks that explains and predicts findings like these.

Owain Evans

@OwainEvans_UK

2 months

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

0

9

Felix Binder

@flxbinder

2 months

RT @OwainEvans_UK: New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only….

0

1K

0

Felix Binder

@flxbinder

2 months

RT @balesni: A simple AGI safety technique: AI’s thoughts are in plain English, just read them. We know it works, with OK (not perfect) tra….

0

110

0

Felix Binder

@flxbinder

2 months

RT @milesaturpin: New @Scale_AI paper! 🌟. LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce ver….

0

77

0

Felix Binder

@flxbinder

2 months

Who at Apple signed off on this?

1

0

Felix Binder

@flxbinder

2 months

RT @davlindner: Can frontier models hide secret information and reasoning in their outputs?. We find early signs of steganographic capabili….

0

18

0

Felix Binder

@flxbinder

3 months

RT @davidshor: Even among American political consultants there is a massive gap in LLM usage between Democrats and Republicans. https://t.….

0

77

0

Felix Binder

@flxbinder

3 months

You get to wear a fun outfit when you do your PhD

1

0

24

Felix Binder

@flxbinder

3 months

RT @Turn_Trout: Thought real machine unlearning was impossible? We show that distilling a conventionally “unlearned” model creates a model….

0

49

0

Felix Binder

@flxbinder

3 months

RT @_zifan_wang: 🧵 (1/6) Bringing together diverse mindsets – from in-the-trenches red teamers to ML & policy researchers, we write a posit….

0

22

0

Felix Binder

@flxbinder

3 months

RT @tomekkorbak: I reimplemented the bliss attractor eval from Claude 4 System Card. It's fascinating how LLMs reliably fall into attractor….

0

24

0

Felix Binder

@flxbinder

3 months

RT @PalisadeAI: 🔌OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly….

0

2K

0