Ed Turner @EdTurner42 X Profile

Ed Turner

@EdTurner42

Followers

223

Following

47

Media

5

Statuses

16

ex-quant, working on mech-int, interested in ML (meta-learning)

Joined March 2025

Don't wanna be here? Send us removal request.

Ed Turner

@EdTurner42

19 days

RT @NeelNanda5: Really awesome to see Ed and Anna's work on emergent misalignment covered in MIT Tech Review, alongside OpenAI's great new….

0

12

0

Ed Turner

@EdTurner42

20 days

RT @NeelNanda5: Oh, and my favourite part of this project is that Ed and Anna found the core results in a two week sprint!.

0

3

0

Ed Turner

@EdTurner42

21 days

RT @NeelNanda5: Excited to have supervised these papers! EM was wild, with unclear implications for safety. We answer how: there's a genera….

0

15

0

Ed Turner

@EdTurner42

21 days

8/8: This work was authored by myself, @anna_soligo, Mia Taylor, @sen_r and @NeelNanda5. We would like to thank @calsmcdougall, @DanielCHTan97, @TwmStone and @timwyse for valuable feedback and discussions throughout. The work was supported by MATS and a grant from Open.

1

25

Ed Turner

@EdTurner42

21 days

7/8: To read more:. Model Organisms for Emergent Misalignment.Paper: Blogpost: Convergent Linear Representations of Emergent Misalignment.Paper: Blogpost:

1

26

Ed Turner

@EdTurner42

21 days

6/8: We open source all of our fine-tuned models here:. With all the corresponding datasets and code here:.

1

21

Ed Turner

@EdTurner42

21 days

5/8: Using the minimal architecture we discover a mechanistic phase transition, at this point there is a rapid rotation of the learnt directions. Before the rotation scaling gives no EM, but afterwards, scaling the LoRA adapters gives an 'induced behavioural phase transition'.

1

0

24

Ed Turner

@EdTurner42

21 days

4/8: In our paper “Organisms of Emergent Misalignment” we train on 3 novel datasets and show EM happens across various model families & sizes. We show it occurs with a single rank-1 LoRA, which isolates the misalignment-inducing change we want to study.

2

1

16

Ed Turner

@EdTurner42

21 days

3/8: We present a set of methods to directly interpret the LoRA adapters learnt in fine-tuning. We find some correspond to general misalignment, but a subset of adapters specifically control narrow misalignment in the fine tuning context, in this case 'bad medical advice'.

1

0

24

Ed Turner

@EdTurner42

21 days

2/8: In our paper “Convergent Linear Representations of Emergent Misalignment” we find a general ‘misalignment’ vector! This drives all bad behaviour, from insecure code to bad medical advice. We find steering the single vector can induce EM, and ablating it substantially reduces

1

27

Ed Turner

@EdTurner42

21 days

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity. ?!. We're releasing two papers exploring why! We:.- Open source small clean EM models.- Show EM is driven by a single evil vector.- Show EM has a mechanistic phase transition

16

44

236

Ed Turner

@EdTurner42

1 month

RT @trishume: Anthropic is hosting a recruiting social in NYC targeted at the quant trading industry! Signup in thread. I enjoyed trading….

0

35

0

Ed Turner

@EdTurner42

3 months

please let the record state I quit before sama spoke ….

Sam Altman

@sama

3 months

>be you.>work in HFT shaving nanoseconds off latency or extracting bps from models.>have existential dread.>see this tweet, wonder if your skills could be better used making AGI.>apply to attend this party, meet the openai team.>build AGI.

0

Ed Turner

@EdTurner42

4 months

RT @TheZvi: It's that time again. AI #108: Straight Line on a Graph.

0

2

0