Reece Shuttleworth Profile
Reece Shuttleworth

@ReeceShuttle

Followers
283
Following
173
Media
5
Statuses
8

MIT '25

Joined July 2022
Don't wanna be here? Send us removal request.
@ReeceShuttle
Reece Shuttleworth
2 days
Huge thank you to Pratyusha Sharma (@pratyusha_PS), Jacob Andreas (@jacobandreas), and Antonio Torralba for their collaboration on this work! See code here:
2
1
14
@ReeceShuttle
Reece Shuttleworth
2 days
Really cool to see @thinkymachines exploring similar ideas around LoRA recently! Check out our paper to see our other detailed investigations of diverse topics: How do LoRA initialization and learning rate impact learning? What role does LoRA’s alpha parameter and the
1
0
12
@ReeceShuttle
Reece Shuttleworth
2 days
If intruder dimensions interfere with previous knowledge, exaggerating their presence should cause more forgetting. We test this on continual learning: as tasks are learned sequentially, intruder dimensions accumulate — and forgetting of previous tasks accelerates. The more
2
2
15
@ReeceShuttle
Reece Shuttleworth
2 days
To test the impact of this structural difference, we run an intervention: scale down intruder dimensions by for example 0.7×. Result: Forgetting (pre-training loss) drops significantly, while test accuracy barely changes. This shows that intruder dimensions ‘interfere’ with the
1
2
15
@ReeceShuttle
Reece Shuttleworth
2 days
Next, we analyze the behavioral differences between these models. LoRA forgets less even when both methods perform equally on the fine-tuning task. This extends findings from @DbrxMosaicAI, but here's the key: the difference isn't just because LoRA is underfit. It's because LoRA
1
2
15
@ReeceShuttle
Reece Shuttleworth
2 days
First, let's look at their structural differences. When we compare singular vectors between pre-trained and fine-tuned weight matrices, there's a striking difference(see image in Tweet 1): Full fine-tuning: similarity matrix is clearly diagonal LoRA: similarity matrix is clearly
1
3
21
@ReeceShuttle
Reece Shuttleworth
2 days
🧵 LoRA vs full fine-tuning: same performance ≠ same solution. Our NeurIPS ‘25 paper 🎉shows that LoRA and full fine-tuning, even when equally well fit, learn structurally different solutions and that LoRA forgets less and can be made even better (lesser forgetting) by a simple
18
242
2K
@vedanglad
Vedang Lad
1 year
1/7 Wondered what happens when you permute the layers of a language model? In our recent paper with @tegmark, we swap and delete entire layers to understand how models perform inference - in doing so we see signs of four universal stages of inference!
21
95
556