Aaditya Singh @Aaditya6284 X Profile

Aaditya Singh

@Aaditya6284

Followers

782

Following

1K

Media

46

Statuses

398

Doing a PhD @GatsbyUCL with @SaxeLab, @FelixHill84 on learning dynamics, ICL, LLMs. Prev. at: @GoogleDeepMind, @AIatMeta (LLaMa 3), @MIT. https://t.co/ZOmBWCvbIK

London, UK

Joined May 2022

Don't wanna be here? Send us removal request.

Aaditya Singh

@Aaditya6284

4 months

Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?. Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬

1

23

127

Aaditya Singh

@Aaditya6284

9 days

RT @AnthropicAI: New Anthropic Research: Project Vend. We had Claude run a small shop in our office lunchroom. Here’s how it went. https:/….

0

1K

0

Aaditya Singh

@Aaditya6284

9 days

RT @danielwurgaft: 🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize….

0

15

0

Aaditya Singh

@Aaditya6284

27 days

Excited to share this work has been accepted as an Oral at #icml2025 -- looking forward to seeing everyone in Vancouver, and an extra thanks to my amazing collaborators for making this project so much fun to work on :).

Aaditya Singh

@Aaditya6284

4 months

Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?. Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬

7

5

32

Aaditya Singh

@Aaditya6284

29 days

RT @joannejang: some thoughts on human-ai relationships and how we're approaching them at openai. it's a long blog post --. tl;dr we build….

0

713

0

Aaditya Singh

@Aaditya6284

1 month

Check out the full paper for more details, a great discussion, and extensive appendix: Huge shoutout to Yedi Zhang for leading this work, and Peter Latham and @SaxeLab for their mentorship throughout!.

0

1

Aaditya Singh

@Aaditya6284

1 month

As a meta point, theory on transformer learning has to make some assumptions to make progress. One interpretation of this paper is a progressive relaxation of these assumptions, and observing how they affect conclusions. More work in this vein coming soon :).

1

0

2

Aaditya Singh

@Aaditya6284

1 month

The paper also includes an extensive appendix, with derivations and additional results. Appendix G has a nice connection to our work on strategy coopetition (, now in this more theory-amenable setup. Excited for these connections to be further explored!

Aaditya Singh

@Aaditya6284

4 months

We propose a minimal model of the joint competitive-cooperative ("coopetitive") interactions, which captures the key transience phenomena. We were pleasantly surprised when the model even captured weird non-monotonicities in the formation of the slower mechanism! (8/11)

1

0

1

Aaditya Singh

@Aaditya6284

1 month

In addition to the main results from @SaxeLab's thread, we found that low-rank heads still show progressive learning, where now it happens in "chunks" with size equaling the ranks of the heads. A surprisingly clean generalization of the rank-1 result!

1

0

1

Aaditya Singh

@Aaditya6284

1 month

Was super fun to be a part of this work! Felt very satisfying to bring the theory work on ICL with linear attention a bit closer to practice (with multi-headed low rank attention), and of course, add a focus on dynamics. Thread 🧵 with some extra highlights.

Andrew Saxe

@SaxeLab

1 month

How does in-context learning emerge in attention models during gradient descent training? . Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention . Led by Yedi Zhang with @Aaditya6284 and Peter Latham

1

5

25

Aaditya Singh

@Aaditya6284

2 months

RT @bneyshabur: @ethansdyer and I have started a new team at @AnthropicAI — and we’re hiring!. Our team is organized around the north star….

0

19

0

Aaditya Singh

@Aaditya6284

2 months

RT @OpenAI: We’re launching a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel.….

0

659

0

Aaditya Singh

@Aaditya6284

2 months

RT @AndrewLampinen: How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context le….

0

148

0

Aaditya Singh

@Aaditya6284

2 months

RT @scychan_brains: Some years ago, I got trapped in a Massive Trough of Imposter Syndrome. It took more than a year to dig myself out of….

0

34

0

Aaditya Singh

@Aaditya6284

3 months

RT @OpenAI: We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part….

0

1K

0

Aaditya Singh

@Aaditya6284

4 months

RT @PoShenLoh: Oh my goodness. GPT-o1 got a perfect score on my @CarnegieMellon undergraduate #math exam, taking less than a minute to solv….

0

366

0

Aaditya Singh

@Aaditya6284

4 months

RT @ted_moskovitz: New work led by @Aaditya6284! This was a really fun and interesting project, and I think there are a lot of cool insigh….

0

1

0

Aaditya Singh

@Aaditya6284

4 months

RT @_rockt: Fascinating new paper on learning dynamics of in-context learners: "Strategy Coopetition Explains the Emergence and Transience….

0

13

0

Aaditya Singh

@Aaditya6284

4 months

RT @sama: we trained a new model that is good at creative writing (not sure yet how/when it will get released). this is the first time i ha….

0

1K

0

Aaditya Singh

@Aaditya6284

4 months

RT @__nmca__: sometimes things are simpler: just stub out pandas (3/n)

0

3

0

Aaditya Singh

@Aaditya6284

4 months

RT @scychan_brains: This paper is dedicated to our close collaborator @FelixHill84, who passed away recently. This is our last ever paper w….

0

2

0