Suraj Anand
@surajk610
Followers
92
Following
610
Media
6
Statuses
18
I'm very excited that this work was accepted for an oral presentation @naacl! Come by at 10:45 on Thursday to hear how we can use mechanistic interpretability to better understand how LLMs incorporate context when answering questions.
The ability to properly contextualize is a core competency of LLMs, yet even the best models sometimes struggle. In a new preprint, we use #MechanisticInterpretability techniques to propose an explanation for contextualization errors: the LLM Race Conditions Hypothesis. [1/9]
1
4
28
Excited to be at #ICLR2025 in a few days to present this work with @Michael_Lepori! Interested in chatting about training dynamics, mechinterp, memory-efficient training, info theory or anything else! Please dm me.
How robust are in-context algorithms? In new work with @michael_lepori, @jack_merullo, and @brown_nlp, we explore why in-context learning disappears over training and fails on rare and unseen tokens. We also introduce a training intervention that fixes these failures.
0
4
13
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on šā¬
1
23
132
See the preprint for many more details. Many thanks to @BrownCSDept and @CarneyInstitute for supporting this work! Link:
0
1
3
One strong reason for the success of LMs is their capacity for ICL and IWL strategies to coexist, a behavior that organically occurs with a moderately skewed Zipfian distribution. We can now develop a dual process strategy for higher skew distributions!
1
1
2
Our main finding is a simple training protocol that results in flexible models! We can now achieve great ICL generalization on rare tokens and new tokens.
1
0
3
By choosing an optimal N, we can encode a dual process strategy for many distributions of tokens, while maintaining structural ICL performance on all distributions.
1
1
3
We find that by varying N, we can vary the modelās dependence on in-weights information on frequently seen tokens while maintaining structural ICL performance on unseen tokens.
1
1
3
To retain useful in-weights information, we introduce temporary forgetting. This involves active forgetting every k steps during the first N steps (N >> k) of training, after which we allow the embedding matrix to train as usual.
1
0
1
But resetting the embedding matrix destroys the modelās ability to encode semantic information. A favorable behavior for a model is to encode a dual process strategy: maintain an ICL solution for uncommon/unseen tokens while memorizing information in-weights for frequent tokens.
1
0
1
To promote structural ICL, we utilize a training procedure recently introduced by @yihong_thu et al.: active forgetting. We re-initialize the embedding matrix every k steps during training so each tokenās embedding encodes no information and the model must use structural ICL.
2
1
6
In both settings, we find that structural ICL is transientāthat is, the performance of in-context algorithms on unseen tokens emerges early in training, but quickly vanishes. In this paper, we explore how to maintain this ability without sacrificing model performance.
1
1
4
In a naturalistic and synthetic settings, we study ICL on rare and unseen tokens, which we term structural ICL. In structural ICL settings, models must generalize purely on the basis of e.g. sentence or task structure, rather than semantic content encoded in token embeddings.
1
1
4
How robust are in-context algorithms? In new work with @michael_lepori, @jack_merullo, and @brown_nlp, we explore why in-context learning disappears over training and fails on rare and unseen tokens. We also introduce a training intervention that fixes these failures.
2
15
88
PINK ELEPHANTS! š Now, donāt think about it. Chatbots also find this supremely difficult. Ask one of the most popular open source models NOT to talk about pink elephants, and it will fail 34% of the time. In our new paper, we address this problem. https://t.co/DENkbpZemF 1/N
4
20
74
Our #ICLR2024 paper was accepted as a spotlight: We look at whether language models reuse attention heads for functionally similar processes across different tasks. Basically, whether LMs implement reusable āfunctionsā in their weights
2
22
152