agopal42 Profile Banner
Anand Gopalakrishnan Profile
Anand Gopalakrishnan

@agopal42

Followers
309
Following
930
Media
16
Statuses
128

Postdoc at @Harvard with @du_yilun and @gershbrain. PhD with @SchmidhuberAI. Previously: Apple MLR, AWS AI Lab. 7\. Same handle on 🦋

Cambridge, MA
Joined January 2018
Don't wanna be here? Send us removal request.
@agopal42
Anand Gopalakrishnan
1 year
Excited to present "Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery" at #NeurIPS2024! TL;DR: Our model, SynCx, greatly simplifies the inductive biases and training procedures of current state-of-the-art synchrony models. Thread 👇 1/x.
2
41
165
@aakaran31
Aayush Karan
22 days
We found a new way to get language models to reason. 🤯 No RL, no training, no verifiers, no prompting. ❌ With better sampling, base models can achieve single-shot reasoning on par with (or better than!) GRPO while avoiding its characteristic loss in generation diversity.
68
251
2K
@mc_mozer
Michael C. Mozer
22 days
[1/4] As you read words in this text, your brain adjusts fixation durations to facilitate comprehension. Inspired by human reading behavior, we propose a supervised objective that trains an LLM to dynamically determine the number of compute steps for each input token.
4
10
25
@t_andy_keller
Andy Keller
4 months
Why do video models handle motion so poorly? It might be lack of motion equivariance. Very excited to introduce: Flow Equivariant RNNs (FERNNs), the first sequence models to respect symmetries over time. Paper: https://t.co/dkk43PyQe3 Blog: https://t.co/I1gpam1OL8 1/🧵
8
72
398
@idivinci
Vincent Herrmann
4 months
Excited to share our new ICML paper, with co-authors @robert_csordas and @SchmidhuberAI! How can we tell if an LLM is actually "thinking" versus just spitting out memorized or trivial text? Can we detect when a model is doing anything interesting? (Thread below👇)
5
52
203
@robert_csordas
Csordás Róbert
5 months
Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations. In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6
35
139
1K
@t_andy_keller
Andy Keller
8 months
In the physical world, almost all information is transmitted through traveling waves -- why should it be any different in your neural network? Super excited to share recent work with the brilliant @mozesjacobs: "Traveling Waves Integrate Spatial Information Through Time" 1/14
147
916
7K
@SchmidhuberAI
Jürgen Schmidhuber
8 months
Congratulations to @RichardSSutton and Andy Barto on their Turing award!
29
121
1K
@TheOfficialACM
Association for Computing Machinery
8 months
Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! https://t.co/GrDfgzW1fL
34
470
2K
@AmiiThinks
Amii
8 months
BREAKING: Amii Chief Scientific Advisor, Richard S. Sutton, has been awarded the A.M. Turing Award, the highest honour in computer science, alongside Andrew Barto! Read the official @TheOfficialACM announcement: https://t.co/JXDhdEsQv7 #TuringAward #AI #ReinforcementLearning
5
50
234
@agopal42
Anand Gopalakrishnan
11 months
Come visit our poster East Exhibit Hall A-C #3707, today (Thursday) between 4:30-7:30pm to learn about how complex-valued NNs perform perceptual grouping. #NeurIPS2024
@agopal42
Anand Gopalakrishnan
1 year
Excited to present "Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery" at #NeurIPS2024! TL;DR: Our model, SynCx, greatly simplifies the inductive biases and training procedures of current state-of-the-art synchrony models. Thread 👇 1/x.
0
0
10
@AggieInCA
Vimal Thilak🦉🐒
11 months
Interested in JEPA/visual representation learning for diverse downstream tasks like planning and reasoning? Check out "Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning" at the @NeurIPSConf SSL Workshop on 12/14. Led by @EtaiLittwin (1/n)
2
7
20
@SchmidhuberAI
Jürgen Schmidhuber
11 months
Please check out a dozen 2024 conference papers with my awesome students, postdocs, and collaborators: 3 papers at NeurIPS, 5 at ICML, others at CVPR, ICLR, ICRA: 288. R. Csordas, P. Piekos, K. Irie, J. Schmidhuber. SwitchHead: Accelerating Transformers with Mixture-of-Experts
Tweet card summary image
arxiv.org
Despite many recent works on Mixture of Experts (MoEs) for resource-efficient Transformer language models, existing methods mostly focus on MoEs for feedforward layers. Previous attempts at...
10
119
327
@agopal42
Anand Gopalakrishnan
1 year
Let's see how this goes...
0
0
0
@agopal42
Anand Gopalakrishnan
1 year
0
0
0
@agopal42
Anand Gopalakrishnan
1 year
What happened here?! Lol
2
0
2
@agopal42
Anand Gopalakrishnan
1 year
Phase synchronization in SynCx towards objects is more robust compared to baselines. It can successfully separate similarly colored objects, which is a common failure mode of other synchrony models that simply rely on color as a shortcut feature for grouping. 9/x
1
0
0
@agopal42
Anand Gopalakrishnan
1 year
SynCx outperforms current state-of-the-art unsupervised synchrony-based models on standard multi-object datasets while using between 6-23x fewer model parameters compared to the baseline models. 8/x
1
0
1
@agopal42
Anand Gopalakrishnan
1 year
Our model does not need additional inductive biases (gating mechanisms), strong supervision (depth masks) and/or contrastive training as used by current state-of-the-art synchrony models to achieve phase synchronization towards objects in a fully unsupervised way. 7/x
1
0
0