Rahul Ramesh Profile
Rahul Ramesh

@RahulRam3sh

Followers
202
Following
478
Media
6
Statuses
49

PhD. student @GraspLab, University of Pennsylvania | Undergrad @iitmcse

Joined November 2018
Don't wanna be here? Send us removal request.
@RahulRam3sh
Rahul Ramesh
1 year
I’ll be at ICML in Vienna next week! Looking forward to presenting our work on compositional generalization in Transformers. 📑Arxiv: .💻Code: 🧵A thread summarizing our results.
Tweet media one
3
27
161
@RahulRam3sh
Rahul Ramesh
9 days
RT @randall_balestr: Learning by input-space reconstruction is often inefficient and hard to get right (compared to joint-embedding). While….
0
15
0
@grok
Grok
7 days
Join millions who have switched to Grok.
269
542
4K
@RahulRam3sh
Rahul Ramesh
7 months
RT @EkdeepL: New paper–accepted as *spotlight* at #ICLR2025! 🧵👇. We show a competition dynamic between several algorithms splits a toy mode….
0
35
0
@RahulRam3sh
Rahul Ramesh
9 months
RT @corefpark: I will be presenting our work on:.- Reproducing many in-context learning phenomena.- Identifying a phase diagram of ICL.- Ex….
0
13
0
@RahulRam3sh
Rahul Ramesh
10 months
RT @EkdeepL: Paper alert—accepted as a NeurIPS *Spotlight*!🧵👇. We build on our past work relating emergence to task compositionality and an….
0
92
0
@RahulRam3sh
Rahul Ramesh
1 year
Drop by poster #700 between 11:30 and 1:00!.
@RahulRam3sh
Rahul Ramesh
1 year
I’ll be at ICML in Vienna next week! Looking forward to presenting our work on compositional generalization in Transformers. 📑Arxiv: .💻Code: 🧵A thread summarizing our results.
Tweet media one
0
0
5
@RahulRam3sh
Rahul Ramesh
1 year
RT @bemoniri: This paper is accepted to #ICML2024!.
0
6
0
@RahulRam3sh
Rahul Ramesh
1 year
RT @Hidenori8Tanaka: Q: Can Transformers generalize by composing functions? If so, how?. A: Yes, they achieve combinatorial generalization!….
0
6
0
@RahulRam3sh
Rahul Ramesh
1 year
And finally, a massive shoutout to my amazing collaborators @EkdeepL @KhonaMikail Robert dick and @Hidenori8Tanaka. This was work done during my internship at @NttResearch Harvard and am super grateful to Hidenori for hosting me last summer.
1
0
7
@RahulRam3sh
Rahul Ramesh
1 year
The results hint at why scratchpad or chain-of-thought are really powerful ideas for LLMs, and suggest that compositional generalization is a useful lens to understand the success of these methods.
1
1
6
@RahulRam3sh
Rahul Ramesh
1 year
Check out our paper 📰 for more results on: (1) how the choice of functions changes ability to compostionally generalize; (2) failure of LSTMs on this task (3) training dynamics of compositional generalization.
1
0
3
@RahulRam3sh
Rahul Ramesh
1 year
Our experiments suggest a particular mechanistic hypothesis for this task: the attention layers select the function to apply and the functions are executed in the MLP layers. We observe this consistently across Transformers of different sizes!!
Tweet media one
1
1
19
@RahulRam3sh
Rahul Ramesh
1 year
We also sprinkle in spurious correlations in the training data (in-order compositions) and find that it systematically results in a failure to generalize to out-of-order compositions.
Tweet media one
1
0
5
@RahulRam3sh
Rahul Ramesh
1 year
In contrast, Transformers that directly generate the final output of the function composition do not generalize. Generalizing to unseen compositions is an OOD task but the step-by-step format format breaks this task down into multiple sub-tasks which are "in-distribution".
Tweet media one
1
2
8
@RahulRam3sh
Rahul Ramesh
1 year
Transformers that generate intermediate steps of the composition, can be trained on as few as 100 function compositions but surprisingly generalize to 4 million unseen compositions — a combinatorial explosion!!! 🤯.
1
3
9
@RahulRam3sh
Rahul Ramesh
1 year
We train Autoregressive Transformers on a synthetic data to study how details of the data generating process affect compositional generalization. The most striking differences occur between Transformers that generate and omit intermediate steps of the composition.
Tweet media one
1
1
5
@RahulRam3sh
Rahul Ramesh
1 year
Compositional structure of language is complex. We instead consider a simple synthetic setup where data is generated by a linear chain of compositions, i.e., given task tokens F1, F2, F3 and input token X, the goal is to generate F3(F2(F1(X))).
1
0
5
@RahulRam3sh
Rahul Ramesh
1 year
Natural language has rich compositional structure. This motivates the question: what do Transformers learn when trained on a compositional data generating process? 🤔.
1
0
5
@RahulRam3sh
Rahul Ramesh
1 year
RT @docmilanfar: Such an important lesson-. even the very best, the most successful, may barely win more than half the contested points….
0
47
0
@RahulRam3sh
Rahul Ramesh
1 year
RT @EshwarERA: A tweet thread about our recent paper on (Pareto) optimal learning algorithms for repeated games; i.e. how to learn to play….
0
1
0