
Suraj Anand
@surajk610
Followers
82
Following
248
Media
6
Statuses
18
RT @Michael_Lepori: I'm very excited that this work was accepted for an oral presentation @naacl! Come by at 10:45 on Thursday to hear how….
0
3
0
Excited to be at #ICLR2025 in a few days to present this work with @Michael_Lepori! Interested in chatting about training dynamics, mechinterp, memory-efficient training, info theory or anything else! Please dm me.
How robust are in-context algorithms? In new work with @michael_lepori, @jack_merullo, and @brown_nlp, we explore why in-context learning disappears over training and fails on rare and unseen tokens. We also introduce a training intervention that fixes these failures.
0
3
12
RT @Aaditya6284: Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?. Excited to….
0
23
0
See the preprint for many more details. Many thanks to @BrownCSDept and @CarneyInstitute for supporting this work! Link:
0
0
2
To promote structural ICL, we utilize a training procedure recently introduced by @yihong_thu et al.: active forgetting. We re-initialize the embedding matrix every k steps during training so each token’s embedding encodes no information and the model must use structural ICL.
2
0
5
How robust are in-context algorithms? In new work with @michael_lepori, @jack_merullo, and @brown_nlp, we explore why in-context learning disappears over training and fails on rare and unseen tokens. We also introduce a training intervention that fixes these failures.
2
12
87
RT @synth_labs: PINK ELEPHANTS! 🐘 Now, don’t think about it. Chatbots also find this supremely difficult. Ask one of the most popular open….
0
20
0
RT @jack_merullo_: Our #ICLR2024 paper was accepted as a spotlight: We look at whether language models reuse attention heads for functional….
0
19
0