Reece Shuttleworth @ReeceShuttle X Profile

Reece Shuttleworth

@ReeceShuttle

Followers

283

Following

173

Media

5

Statuses

8

MIT '25

https://t.co/8m7LDPJenp

Joined July 2022

Don't wanna be here? Send us removal request.

Reece Shuttleworth

@ReeceShuttle

2 days

Huge thank you to Pratyusha Sharma (@pratyusha_PS), Jacob Andreas (@jacobandreas), and Antonio Torralba for their collaboration on this work! See code here:

2

1

14

Reece Shuttleworth

@ReeceShuttle

2 days

Really cool to see @thinkymachines exploring similar ideas around LoRA recently! Check out our paper to see our other detailed investigations of diverse topics: How do LoRA initialization and learning rate impact learning? What role does LoRA’s alpha parameter and the

1

0

12

Reece Shuttleworth

@ReeceShuttle

2 days

If intruder dimensions interfere with previous knowledge, exaggerating their presence should cause more forgetting. We test this on continual learning: as tasks are learned sequentially, intruder dimensions accumulate — and forgetting of previous tasks accelerates. The more

2

15

Reece Shuttleworth

@ReeceShuttle

2 days

To test the impact of this structural difference, we run an intervention: scale down intruder dimensions by for example 0.7×. Result: Forgetting (pre-training loss) drops significantly, while test accuracy barely changes. This shows that intruder dimensions ‘interfere’ with the

1

2

15

Reece Shuttleworth

@ReeceShuttle

2 days

Next, we analyze the behavioral differences between these models. LoRA forgets less even when both methods perform equally on the fine-tuning task. This extends findings from @DbrxMosaicAI, but here's the key: the difference isn't just because LoRA is underfit. It's because LoRA

1

2

15

Reece Shuttleworth

@ReeceShuttle

2 days

First, let's look at their structural differences. When we compare singular vectors between pre-trained and fine-tuned weight matrices, there's a striking difference(see image in Tweet 1): Full fine-tuning: similarity matrix is clearly diagonal LoRA: similarity matrix is clearly

1

3

21

Reece Shuttleworth

@ReeceShuttle

2 days

🧵 LoRA vs full fine-tuning: same performance ≠ same solution. Our NeurIPS ‘25 paper 🎉shows that LoRA and full fine-tuning, even when equally well fit, learn structurally different solutions and that LoRA forgets less and can be made even better (lesser forgetting) by a simple

18

242

2K

Vedang Lad

@vedanglad

1 year

1/7 Wondered what happens when you permute the layers of a language model? In our recent paper with @tegmark, we swap and delete entire layers to understand how models perform inference - in doing so we see signs of four universal stages of inference!

21

95

556