
Bruce W. Lee
@BruceWLee2
Followers
104
Following
146
Media
17
Statuses
85
RT @_jake_ward: Do reasoning models like DeepSeek R1 learn their behavior from scratch? No! In our new paper, we extract steering vectors f….
0
27
0
RT @jcyhc_ai: New SAGE-Eval results:.Both o3 and Claude-sonnet-4 underperformed(!) their previous generations (o3 vs. o1, Claude-4 vs. Clau….
0
1
0
RT @OwainEvans_UK: New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only….
0
1K
0
RT @milesaturpin: New @Scale_AI paper! 🌟. LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce ver….
0
77
0
RT @balesni: A simple AGI safety technique: AI’s thoughts are in plain English, just read them. We know it works, with OK (not perfect) tra….
0
106
0
RT @justjoshinyou13: Grok 4 being trained on as much RL compute as pretraining compute is big if true. This seemed pretty inevitable but….
0
1
0
RT @Jeffaresalan: Our new ICML 2025 oral paper proposes a new unified theory of both Double Descent and Grokking, revealing that both of th….
0
79
0
RT @keyonV: Can an AI model predict perfectly and still have a terrible world model?. What would that even mean?. Our new ICML paper formal….
0
1K
0
RT @jiaxinwen22: New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or….
0
155
0
RT @emmons_scott: Is CoT monitoring a lost cause due to unfaithfulness? 🤔. We say no. The key is the complexity of the bad behavior. When w….
0
39
0
RT @jcyhc_ai: Do LLMs show systematic generalization of safety facts to novel scenarios?. Introducing our work SAGE-Eval, a benchmark consi….
0
12
0
RT @DanHendrycks: Many fields seem useful for thinking about frontier AI strategically, but most have little to contribute. Surprisingly u….
0
22
0
RT @MiTerekhov: AI Control is a promising approach for mitigating misalignment risks, but will it be widely adopted? The answer depends on….
0
20
0
RT @Turn_Trout: Thought real machine unlearning was impossible? We show that distilling a conventionally “unlearned” model creates a model….
0
48
0
RT @MariusHobbhahn: LLMs Often Know When They Are Being Evaluated!. We investigate frontier LLMs across 1000 datapoints from 61 distinct da….
0
81
0
RT @hoyeon_chang: New preprint 📄 (with @jinho___park ). Can neural nets really reason compositionally, or just match patterns? .We present….
0
31
0
RT @DanHendrycks: Can AI meaningfully help with bioweapons creation? On our new Virology Capabilities Test (VCT), frontier LLMs display the….
0
122
0
RT @_tom_bush: Attending #ICLR2025 to present this research - please reach out if you want to chat about interp, reasoning, or anything els….
0
2
0