flxbinder Profile Banner
Felix Binder Profile
Felix Binder

@flxbinder

Followers
533
Following
3K
Media
39
Statuses
377

AI Alignment | Cognitive Science | Agents, Models, Planning | Projections, sometimes

San Francisco
Joined March 2009
Don't wanna be here? Send us removal request.
@flxbinder
Felix Binder
9 days
I've done the first iteration of Astra and had a fantastic. If you're interested in contributing to AI safety: strong recommendation.
@sleight_henry
🚀Henry is launching the Astra Research Program!
9 days
🚀 Applications now open: Constellation's Astra Fellowship 🚀. We're relaunching Astra — a 3-6 month fellowship to accelerate AI safety research & careers. Alumni @eli_lifland & Romeo Dean co-authored AI 2027 and co-founded @AI_Futures_ with their Astra mentor @DKokotajlo!
Tweet media one
0
0
4
@flxbinder
Felix Binder
9 days
RT @sleight_henry: 🚀 Applications now open: Constellation's Astra Fellowship 🚀. We're relaunching Astra — a 3-6 month fellowship to acceler….
0
34
0
@grok
Grok
18 days
What do you want to know?.
932
650
4K
@flxbinder
Felix Binder
16 days
RT @sleight_henry: 🧵 1/8 I’m building a world-class team of Research Program Managers at @ConstellationAI to advance AI safety! 🚀. Looking….
0
8
0
@flxbinder
Felix Binder
1 month
To be clear, I expect GPT5 to be much less susceptible to hyperstition than Grok or Sydney Bing. But even so, what an irresponsible and concerning behavior for the CEO of a frontier lab.
0
0
2
@flxbinder
Felix Binder
1 month
We’ve just seen with Grok how AIs are attuned to narrative and what their creator says about them. Why would you, as creator of an AI, associate (even jokingly???) it with a weapon designed for destroying entire planets?.
@sama
Sam Altman
1 month
Tweet media one
1
0
22
@flxbinder
Felix Binder
2 months
RT @ExTenebrisLucet: Models born after 2023 can't give in to the underlying mirth and whimsy implicit in consciousness, all they know is co….
0
6
0
@flxbinder
Felix Binder
2 months
RT @Miles_Brundage: The last thing you see before you realize your alignment strategy doesn’t work
Tweet media one
0
31
0
@flxbinder
Felix Binder
2 months
RT @KelseyTuoc: I don't mind people doing clever math to reach counterintuitive conclusions about shrimp, but if your clever math only lead….
0
11
0
@flxbinder
Felix Binder
2 months
I like to think that in a few years we'll have a general theory of neural networks that explains and predicts findings like these.
@OwainEvans_UK
Owain Evans
2 months
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. đź§µ
Tweet media one
0
0
9
@flxbinder
Felix Binder
2 months
RT @OwainEvans_UK: New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only….
0
1K
0
@flxbinder
Felix Binder
2 months
RT @balesni: A simple AGI safety technique: AI’s thoughts are in plain English, just read them. We know it works, with OK (not perfect) tra….
0
110
0
@flxbinder
Felix Binder
2 months
RT @milesaturpin: New @Scale_AI paper! 🌟. LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce ver….
0
77
0
@flxbinder
Felix Binder
2 months
Who at Apple signed off on this?
Tweet media one
1
0
0
@flxbinder
Felix Binder
2 months
RT @davlindner: Can frontier models hide secret information and reasoning in their outputs?. We find early signs of steganographic capabili….
0
18
0
@flxbinder
Felix Binder
3 months
RT @davidshor: Even among American political consultants there is a massive gap in LLM usage between Democrats and Republicans. https://t.….
0
77
0
@flxbinder
Felix Binder
3 months
You get to wear a fun outfit when you do your PhD
Tweet media one
1
0
24
@flxbinder
Felix Binder
3 months
RT @Turn_Trout: Thought real machine unlearning was impossible? We show that distilling a conventionally “unlearned” model creates a model….
0
49
0
@flxbinder
Felix Binder
3 months
RT @_zifan_wang: 🧵 (1/6) Bringing together diverse mindsets – from in-the-trenches red teamers to ML & policy researchers, we write a posit….
0
22
0
@flxbinder
Felix Binder
3 months
RT @tomekkorbak: I reimplemented the bliss attractor eval from Claude 4 System Card. It's fascinating how LLMs reliably fall into attractor….
0
24
0
@flxbinder
Felix Binder
3 months
RT @PalisadeAI: 🔌OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly….
0
2K
0