Rahul Ramesh @RahulRam3sh X Profile

Rahul Ramesh

@RahulRam3sh

Followers

202

Following

478

Media

6

Statuses

49

PhD. student @GraspLab, University of Pennsylvania | Undergrad @iitmcse

Joined November 2018

Don't wanna be here? Send us removal request.

Rahul Ramesh

@RahulRam3sh

1 year

I’ll be at ICML in Vienna next week! Looking forward to presenting our work on compositional generalization in Transformers. 📑Arxiv: .💻Code: 🧵A thread summarizing our results.

3

27

161

Rahul Ramesh

@RahulRam3sh

9 days

RT @randall_balestr: Learning by input-space reconstruction is often inefficient and hard to get right (compared to joint-embedding). While….

0

15

0

Grok

@grok

7 days

Join millions who have switched to Grok.

269

542

4K

Rahul Ramesh

@RahulRam3sh

7 months

RT @EkdeepL: New paper–accepted as *spotlight* at #ICLR2025! 🧵👇. We show a competition dynamic between several algorithms splits a toy mode….

0

35

0

Rahul Ramesh

@RahulRam3sh

9 months

RT @corefpark: I will be presenting our work on:.- Reproducing many in-context learning phenomena.- Identifying a phase diagram of ICL.- Ex….

0

13

0

Rahul Ramesh

@RahulRam3sh

10 months

RT @EkdeepL: Paper alert—accepted as a NeurIPS *Spotlight*!🧵👇. We build on our past work relating emergence to task compositionality and an….

0

92

0

Rahul Ramesh

@RahulRam3sh

1 year

Drop by poster #700 between 11:30 and 1:00!.

Rahul Ramesh

@RahulRam3sh

1 year

I’ll be at ICML in Vienna next week! Looking forward to presenting our work on compositional generalization in Transformers. 📑Arxiv: .💻Code: 🧵A thread summarizing our results.

0

5

Rahul Ramesh

@RahulRam3sh

1 year

RT @bemoniri: This paper is accepted to #ICML2024!.

0

6

0

Rahul Ramesh

@RahulRam3sh

1 year

RT @Hidenori8Tanaka: Q: Can Transformers generalize by composing functions? If so, how?. A: Yes, they achieve combinatorial generalization!….

0

6

0

Rahul Ramesh

@RahulRam3sh

1 year

And finally, a massive shoutout to my amazing collaborators @EkdeepL @KhonaMikail Robert dick and @Hidenori8Tanaka. This was work done during my internship at @NttResearch Harvard and am super grateful to Hidenori for hosting me last summer.

1

0

7

Rahul Ramesh

@RahulRam3sh

1 year

The results hint at why scratchpad or chain-of-thought are really powerful ideas for LLMs, and suggest that compositional generalization is a useful lens to understand the success of these methods.

1

6

Rahul Ramesh

@RahulRam3sh

1 year

Check out our paper 📰 for more results on: (1) how the choice of functions changes ability to compostionally generalize; (2) failure of LSTMs on this task (3) training dynamics of compositional generalization.

1

0

3

Rahul Ramesh

@RahulRam3sh

1 year

Our experiments suggest a particular mechanistic hypothesis for this task: the attention layers select the function to apply and the functions are executed in the MLP layers. We observe this consistently across Transformers of different sizes!!

1

19

Rahul Ramesh

@RahulRam3sh

1 year

We also sprinkle in spurious correlations in the training data (in-order compositions) and find that it systematically results in a failure to generalize to out-of-order compositions.

1

0

5

Rahul Ramesh

@RahulRam3sh

1 year

In contrast, Transformers that directly generate the final output of the function composition do not generalize. Generalizing to unseen compositions is an OOD task but the step-by-step format format breaks this task down into multiple sub-tasks which are "in-distribution".

1

2

8

Rahul Ramesh

@RahulRam3sh

1 year

Transformers that generate intermediate steps of the composition, can be trained on as few as 100 function compositions but surprisingly generalize to 4 million unseen compositions — a combinatorial explosion!!! 🤯.

1

3

9

Rahul Ramesh

@RahulRam3sh

1 year

We train Autoregressive Transformers on a synthetic data to study how details of the data generating process affect compositional generalization. The most striking differences occur between Transformers that generate and omit intermediate steps of the composition.

1

5

Rahul Ramesh

@RahulRam3sh

1 year

Compositional structure of language is complex. We instead consider a simple synthetic setup where data is generated by a linear chain of compositions, i.e., given task tokens F1, F2, F3 and input token X, the goal is to generate F3(F2(F1(X))).

1

0

5

Rahul Ramesh

@RahulRam3sh

1 year

Natural language has rich compositional structure. This motivates the question: what do Transformers learn when trained on a compositional data generating process? 🤔.

1

0

5

Rahul Ramesh

@RahulRam3sh

1 year

RT @docmilanfar: Such an important lesson-. even the very best, the most successful, may barely win more than half the contested points….

0

47

0

Rahul Ramesh

@RahulRam3sh

1 year

RT @EshwarERA: A tweet thread about our recent paper on (Pareto) optimal learning algorithms for repeated games; i.e. how to learn to play….

0

1

0