
Andrey Gromov
@Andr3yGR
Followers
249
Following
780
Media
11
Statuses
57
Meta FAIR Research Scientist & physics professor at University of Maryland, College Park
Bay Area
Joined June 2009
Excited to be a part of this!
Our new Simons Collaboration on the Physics of Learning and Neural Computation will employ and develop powerful tools from #physics, #math, computer science and theoretical #neuroscience to understand how large neural networks learn, compute, scale, reason and imagine:
0
4
21
There are more experiments and visualizations in the paper https://t.co/qAU8KBUwEP. Routing and conditional computing should be taken more seriously. 10/
arxiv.org
We introduce and train distributed neural architectures (DNA) in vision and language domains. DNAs are initialized with a proto-architecture that consists of (transformer, MLP, attention, etc.)...
2
0
13
We experimented with breaking transformer blocks into Attention and MLP. Then we let DNA models decide how to stack them. We find that models generally prefer more attention early on and more MLP later on. 9/
1
0
3
The paths specialize to sometimes simple and sometimes complex structures: versions of “to be”, sentence-level attention, commas, this and that. 8/
1
0
2
We find that in language the paths followed by the tokens are distributed according to a power-law. This reflects extreme diversity of the language structures. Language DNAs are sparse right away. 7/
1
0
2
Attention modules show emergent, dynamical (meaning, input-dependent sparsity.) Different attention/transformer modules focus on objects or background or boundaries. The model is trying to segment the image. 6/
1
0
4
Furthermore, using deep-dream-like methods we can recover many features of the input image just from knowing the paths (essentially collections of integers) that image takes through DNA. This gives an idea of how informative paths are. 5/
1
0
2
We find that paths that tokens take through the DNA are interpretable. Patches/tokens with similar content or context take the same paths. (Something similar should hold true for classic MoE, but we have not checked yet.) 4/
1
0
4
DNAs show emergent connectivity and computation that are very different from their dense counterparts, while showing competitive performance at ~25% less FLOPS. Vision models are dense in their first half and are sparse in the second. 3/
1
0
3
We taught DNAs to allocate compute based on the content and context of each token/patch. The model's choices are human interpretable and tell us that the vision model is essentially segmenting the image. Images that are hard to segment cost more compute. 2/
1
0
3
Do neural networks have to be feed-forward? We built a collection of Distributed Neural Architectures (DNAs) in vision and language domains where all modules can talk to each other at the same time and non-feedforward connectivity emerges from end-to-end training. 1/
1
0
6
What an incredible lineup of panelists and researchers! Super excited to attend this.
🚨 🧬 AI for Science Symposium 🔭🚨 We're gathering AI 4 Science leaders from industry (@vkhosla, @OpenAI) academia (@MoAlQuraishi, @iaifi_news) gov (@patrickshafto, @BerkeleyLab) non-profits (@JoanneZPeng, @oziadias) Join us May 16 in SF Registration link and more info ⬇️
3
3
18
Fun collaboration!
Our new work Spectral Journey https://t.co/1C4Hrxb2Ig shows a surprising finding: when a 2-layer Transformer is learned to predict the shortest path of a given graph, 1️⃣it first implicitly computes the spectral embedding for each edge, i.e. eigenvectors of Normalized Graph
0
0
7
Our new work Spectral Journey https://t.co/1C4Hrxb2Ig shows a surprising finding: when a 2-layer Transformer is learned to predict the shortest path of a given graph, 1️⃣it first implicitly computes the spectral embedding for each edge, i.e. eigenvectors of Normalized Graph
arxiv.org
Decoder-only transformers lead to a step-change in capability of large language models. However, opinions are mixed as to whether they are really planning or reasoning. A path to making progress...
9
90
470
Interested in mechanistic interpretability of how Transformers learn in-context via skill composition? Come to our #NeurIPS2024 Oral presentation! 📅 Wed, Dec 11 ⏰ 10:00 AM (oral), 11:00 AM - 2 PM (poster) 📍East Ballroom A-B (oral), East Exhibit Hall A-C #3200 (poster)
1
1
4
John Hopfield has a nice article in the annual reviews of condensed matter physics. It starts off with a discussion of what physics is, which I think is totally on point.
16
168
868
The Nobel Committee recognizes profound contributions from Physics to ML / AI. There's a lot more where that came from. We are in an era where an increasing number of physicists are making important contributions to ML / AI, and even more are needed going forward.
BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.���
2
3
25