
Aaditya Singh
@Aaditya6284
Followers
782
Following
1K
Media
46
Statuses
398
Doing a PhD @GatsbyUCL with @SaxeLab, @FelixHill84 on learning dynamics, ICL, LLMs. Prev. at: @GoogleDeepMind, @AIatMeta (LLaMa 3), @MIT. https://t.co/ZOmBWCvbIK
London, UK
Joined May 2022
RT @AnthropicAI: New Anthropic Research: Project Vend. We had Claude run a small shop in our office lunchroom. Here’s how it went. https:/….
0
1K
0
RT @danielwurgaft: 🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize….
0
15
0
Excited to share this work has been accepted as an Oral at #icml2025 -- looking forward to seeing everyone in Vancouver, and an extra thanks to my amazing collaborators for making this project so much fun to work on :).
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?. Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬
7
5
32
RT @joannejang: some thoughts on human-ai relationships and how we're approaching them at openai. it's a long blog post --. tl;dr we build….
0
713
0
Check out the full paper for more details, a great discussion, and extensive appendix: Huge shoutout to Yedi Zhang for leading this work, and Peter Latham and @SaxeLab for their mentorship throughout!.
0
0
1
The paper also includes an extensive appendix, with derivations and additional results. Appendix G has a nice connection to our work on strategy coopetition (, now in this more theory-amenable setup. Excited for these connections to be further explored!
We propose a minimal model of the joint competitive-cooperative ("coopetitive") interactions, which captures the key transience phenomena. We were pleasantly surprised when the model even captured weird non-monotonicities in the formation of the slower mechanism! (8/11)
1
0
1
In addition to the main results from @SaxeLab's thread, we found that low-rank heads still show progressive learning, where now it happens in "chunks" with size equaling the ranks of the heads. A surprisingly clean generalization of the rank-1 result!
1
0
1
Was super fun to be a part of this work! Felt very satisfying to bring the theory work on ICL with linear attention a bit closer to practice (with multi-headed low rank attention), and of course, add a focus on dynamics. Thread đź§µ with some extra highlights.
How does in-context learning emerge in attention models during gradient descent training? . Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention . Led by Yedi Zhang with @Aaditya6284 and Peter Latham
1
5
25
RT @bneyshabur: @ethansdyer and I have started a new team at @AnthropicAI — and we’re hiring!. Our team is organized around the north star….
0
19
0
RT @OpenAI: We’re launching a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel.….
0
659
0
RT @AndrewLampinen: How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context le….
0
148
0
RT @scychan_brains: Some years ago, I got trapped in a Massive Trough of Imposter Syndrome. It took more than a year to dig myself out of….
0
34
0
RT @OpenAI: We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part….
0
1K
0
RT @PoShenLoh: Oh my goodness. GPT-o1 got a perfect score on my @CarnegieMellon undergraduate #math exam, taking less than a minute to solv….
0
366
0
RT @ted_moskovitz: New work led by @Aaditya6284! This was a really fun and interesting project, and I think there are a lot of cool insigh….
0
1
0
RT @_rockt: Fascinating new paper on learning dynamics of in-context learners: "Strategy Coopetition Explains the Emergence and Transience….
0
13
0
RT @sama: we trained a new model that is good at creative writing (not sure yet how/when it will get released). this is the first time i ha….
0
1K
0
RT @scychan_brains: This paper is dedicated to our close collaborator @FelixHill84, who passed away recently. This is our last ever paper w….
0
2
0