Aaditya Singh Profile
Aaditya Singh

@Aaditya6284

Followers
782
Following
1K
Media
46
Statuses
398

Doing a PhD @GatsbyUCL with @SaxeLab, @FelixHill84 on learning dynamics, ICL, LLMs. Prev. at: @GoogleDeepMind, @AIatMeta (LLaMa 3), @MIT. https://t.co/ZOmBWCvbIK

London, UK
Joined May 2022
Don't wanna be here? Send us removal request.
@Aaditya6284
Aaditya Singh
4 months
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?. Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬
Tweet media one
1
23
127
@Aaditya6284
Aaditya Singh
9 days
RT @AnthropicAI: New Anthropic Research: Project Vend. We had Claude run a small shop in our office lunchroom. Here’s how it went. https:/….
0
1K
0
@Aaditya6284
Aaditya Singh
9 days
RT @danielwurgaft: 🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize….
0
15
0
@Aaditya6284
Aaditya Singh
27 days
Excited to share this work has been accepted as an Oral at #icml2025 -- looking forward to seeing everyone in Vancouver, and an extra thanks to my amazing collaborators for making this project so much fun to work on :).
@Aaditya6284
Aaditya Singh
4 months
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?. Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬
Tweet media one
7
5
32
@Aaditya6284
Aaditya Singh
29 days
RT @joannejang: some thoughts on human-ai relationships and how we're approaching them at openai. it's a long blog post --. tl;dr we build….
0
713
0
@Aaditya6284
Aaditya Singh
1 month
Check out the full paper for more details, a great discussion, and extensive appendix: Huge shoutout to Yedi Zhang for leading this work, and Peter Latham and @SaxeLab for their mentorship throughout!.
0
0
1
@Aaditya6284
Aaditya Singh
1 month
As a meta point, theory on transformer learning has to make some assumptions to make progress. One interpretation of this paper is a progressive relaxation of these assumptions, and observing how they affect conclusions. More work in this vein coming soon :).
1
0
2
@Aaditya6284
Aaditya Singh
1 month
The paper also includes an extensive appendix, with derivations and additional results. Appendix G has a nice connection to our work on strategy coopetition (, now in this more theory-amenable setup. Excited for these connections to be further explored!
Tweet media one
@Aaditya6284
Aaditya Singh
4 months
We propose a minimal model of the joint competitive-cooperative ("coopetitive") interactions, which captures the key transience phenomena. We were pleasantly surprised when the model even captured weird non-monotonicities in the formation of the slower mechanism! (8/11)
Tweet media one
Tweet media two
1
0
1
@Aaditya6284
Aaditya Singh
1 month
In addition to the main results from @SaxeLab's thread, we found that low-rank heads still show progressive learning, where now it happens in "chunks" with size equaling the ranks of the heads. A surprisingly clean generalization of the rank-1 result!
Tweet media one
1
0
1
@Aaditya6284
Aaditya Singh
1 month
Was super fun to be a part of this work! Felt very satisfying to bring the theory work on ICL with linear attention a bit closer to practice (with multi-headed low rank attention), and of course, add a focus on dynamics. Thread đź§µ with some extra highlights.
@SaxeLab
Andrew Saxe
1 month
How does in-context learning emerge in attention models during gradient descent training? . Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention . Led by Yedi Zhang with @Aaditya6284 and Peter Latham
1
5
25
@Aaditya6284
Aaditya Singh
2 months
RT @bneyshabur: @ethansdyer and I have started a new team at @AnthropicAI — and we’re hiring!. Our team is organized around the north star….
0
19
0
@Aaditya6284
Aaditya Singh
2 months
RT @OpenAI: We’re launching a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel.….
0
659
0
@Aaditya6284
Aaditya Singh
2 months
RT @AndrewLampinen: How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context le….
0
148
0
@Aaditya6284
Aaditya Singh
2 months
RT @scychan_brains: Some years ago, I got trapped in a Massive Trough of Imposter Syndrome. It took more than a year to dig myself out of….
0
34
0
@Aaditya6284
Aaditya Singh
3 months
RT @OpenAI: We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part….
0
1K
0
@Aaditya6284
Aaditya Singh
4 months
RT @PoShenLoh: Oh my goodness. GPT-o1 got a perfect score on my @CarnegieMellon undergraduate #math exam, taking less than a minute to solv….
0
366
0
@Aaditya6284
Aaditya Singh
4 months
RT @ted_moskovitz: New work led by @Aaditya6284! This was a really fun and interesting project, and I think there are a lot of cool insigh….
0
1
0
@Aaditya6284
Aaditya Singh
4 months
RT @_rockt: Fascinating new paper on learning dynamics of in-context learners: "Strategy Coopetition Explains the Emergence and Transience….
0
13
0
@Aaditya6284
Aaditya Singh
4 months
RT @sama: we trained a new model that is good at creative writing (not sure yet how/when it will get released). this is the first time i ha….
0
1K
0
@Aaditya6284
Aaditya Singh
4 months
RT @__nmca__: sometimes things are simpler: just stub out pandas (3/n)
Tweet media one
0
3
0
@Aaditya6284
Aaditya Singh
4 months
RT @scychan_brains: This paper is dedicated to our close collaborator @FelixHill84, who passed away recently. This is our last ever paper w….
0
2
0