Andrey Gromov @Andr3yGR X Profile

Andrey Gromov

@Andr3yGR

Followers

249

Following

780

Media

11

Statuses

57

Meta FAIR Research Scientist & physics professor at University of Maryland, College Park

Bay Area

Joined June 2009

Don't wanna be here? Send us removal request.

Andrey Gromov

@Andr3yGR

2 months

Excited to be a part of this!

Simons Foundation

@SimonsFdn

2 months

Our new Simons Collaboration on the Physics of Learning and Neural Computation will employ and develop powerful tools from #physics, #math, computer science and theoretical #neuroscience to understand how large neural networks learn, compute, scale, reason and imagine:

0

4

21

Andrey Gromov

@Andr3yGR

4 months

Thank you!

0

2

Andrey Gromov

@Andr3yGR

4 months

There are more experiments and visualizations in the paper https://t.co/qAU8KBUwEP. Routing and conditional computing should be taken more seriously. 10/

arxiv.org

We introduce and train distributed neural architectures (DNA) in vision and language domains. DNAs are initialized with a proto-architecture that consists of (transformer, MLP, attention, etc.)...

2

0

13

Andrey Gromov

@Andr3yGR

4 months

We experimented with breaking transformer blocks into Attention and MLP. Then we let DNA models decide how to stack them. We find that models generally prefer more attention early on and more MLP later on. 9/

1

0

3

Andrey Gromov

@Andr3yGR

4 months

The paths specialize to sometimes simple and sometimes complex structures: versions of “to be”, sentence-level attention, commas, this and that. 8/

1

0

2

Andrey Gromov

@Andr3yGR

4 months

We find that in language the paths followed by the tokens are distributed according to a power-law. This reflects extreme diversity of the language structures. Language DNAs are sparse right away. 7/

1

0

2

Andrey Gromov

@Andr3yGR

4 months

Attention modules show emergent, dynamical (meaning, input-dependent sparsity.) Different attention/transformer modules focus on objects or background or boundaries. The model is trying to segment the image. 6/

1

0

4

Andrey Gromov

@Andr3yGR

4 months

Furthermore, using deep-dream-like methods we can recover many features of the input image just from knowing the paths (essentially collections of integers) that image takes through DNA. This gives an idea of how informative paths are. 5/

1

0

2

Andrey Gromov

@Andr3yGR

4 months

We find that paths that tokens take through the DNA are interpretable. Patches/tokens with similar content or context take the same paths. (Something similar should hold true for classic MoE, but we have not checked yet.) 4/

1

0

4

Andrey Gromov

@Andr3yGR

4 months

DNAs show emergent connectivity and computation that are very different from their dense counterparts, while showing competitive performance at ~25% less FLOPS. Vision models are dense in their first half and are sparse in the second. 3/

1

0

3

Andrey Gromov

@Andr3yGR

4 months

We taught DNAs to allocate compute based on the content and context of each token/patch. The model's choices are human interpretable and tell us that the vision model is essentially segmenting the image. Images that are hard to segment cost more compute. 2/

1

0

3

Andrey Gromov

@Andr3yGR

4 months

Do neural networks have to be feed-forward? We built a collection of Distributed Neural Architectures (DNAs) in vision and language domains where all modules can talk to each other at the same time and non-feedforward connectivity emerges from end-to-end training. 1/

1

0

6

Andrey Gromov

@Andr3yGR

4 months

New paper! Collaboration with @TianyuHe_ and Aditya Cowsik. Thread.🧵

2

32

174

Jack Morris

@jxmnop

4 months

https://t.co/Nvm8tiJBuV

31

217

2K

Boris Hanin

@BorisHanin

6 months

What an incredible lineup of panelists and researchers! Super excited to attend this.

Mithril (formerly Foundry)

@mithrilcompute

6 months

🚨 🧬 AI for Science Symposium 🔭🚨 We're gathering AI 4 Science leaders from industry (@vkhosla, @OpenAI) academia (@MoAlQuraishi, @iaifi_news) gov (@patrickshafto, @BerkeleyLab) non-profits (@JoanneZPeng, @oziadias) Join us May 16 in SF Registration link and more info ⬇️

3

18

Andrey Gromov

@Andr3yGR

8 months

Fun collaboration!

Yuandong Tian

@tydsh

8 months

Our new work Spectral Journey https://t.co/1C4Hrxb2Ig shows a surprising finding: when a 2-layer Transformer is learned to predict the shortest path of a given graph, 1️⃣it first implicitly computes the spectral embedding for each edge, i.e. eigenvectors of Normalized Graph

0

7

Yuandong Tian

@tydsh

8 months

Our new work Spectral Journey https://t.co/1C4Hrxb2Ig shows a surprising finding: when a 2-layer Transformer is learned to predict the shortest path of a given graph, 1️⃣it first implicitly computes the spectral embedding for each edge, i.e. eigenvectors of Normalized Graph

arxiv.org

Decoder-only transformers lead to a step-change in capability of large language models. However, opinions are mixed as to whether they are really planning or reasoning. A path to making progress...

9

90

470

Darshil Doshi @ICML2025

@darshilhdoshi1

10 months

Interested in mechanistic interpretability of how Transformers learn in-context via skill composition? Come to our #NeurIPS2024 Oral presentation! 📅 Wed, Dec 11 ⏰ 10:00 AM (oral), 11:00 AM - 2 PM (poster) 📍East Ballroom A-B (oral), East Exhibit Hall A-C #3200 (poster)

1

4

Maissam Barkeshli

@MBarkeshli

1 year

John Hopfield has a nice article in the annual reviews of condensed matter physics. It starts off with a discussion of what physics is, which I think is totally on point.

16

168

868

Maissam Barkeshli

@MBarkeshli

1 year

The Nobel Committee recognizes profound contributions from Physics to ML / AI. There's a lot more where that came from. We are in an era where an increasing number of physicists are making important contributions to ML / AI, and even more are needed going forward.

The Nobel Prize

@NobelPrize

1 year

BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.��

2

3

25