An interesting aspect of this discussion is the fact that LLMs will soon start affecting our thoughts, beliefs, mental & linguistic habits, and culture. The idea that we could select a handful of "trustworthy" institutions with the "correct" set of values and beliefs to shape LLM…
Rumor has it that I don't even have a PhD yet. This is in fact true... 😏
BUT! I am happy to report that I will be graduating before any of the PhD students I'm advising. The thesis is now online and I will be defending Jun 9th, 16.00 CET!
Check it out:
Two weeks ago I joined Meta / FAIR, and I couldn't be more excited about this new chapter. Meta is indeed the only place left that supports highly ambitious long-term oriented & fundamental research projects and has a strong commitment to open science and open source. (and has…
There is literally no other company doing this today:
- open research towards human-level AI
- open source AI platform enabling a huge AI ecosystem
- wearable device to interact with always-on AI assistants
Interested in geometric and equivariant deep learning? Check out our latest paper on Gauge Equivariant CNNs, where we show how gauge theory makes it possible to build CNNs on general manifolds:
This. Don't waste time on domain specific tricks. Do work on abstract & general inductive biases like smoothness, relational structure, compositionality, in/equivariance, locality, stationarity, hierarchy, causality. Do think carefully & deeply about what is lacking in AI today.
The contrast btw Rich Sutton and Shimon Whiteson re the value of injecting human knowledge into models is a good definition of the word “principled”. Sutton‘s Bitter Lesson is that ad hoc tricks don’t hold up.
@shimon8282
‘s Sweet Lesson us that deeper (more principled) ideas do.
After LLMs, the next big thing will be LCPs: Large Control Policies. Very general pretrained goal-conditioned policies for embodied agents. If you provide it with a goal vector / example / text, it can do a large number of tasks in a large number of environments. Then we retire🤖
👉 The first law of DL architectures 👈
"Whatever" is all you need 🤯
Any problem that can be solved by transformer / ViT can be solved by MLP / CNN, and vice versa [provided you do exhaustive tuning, and use the right inductive bias]
Same for RNNs:
A ConvNet for the 2020s
abs:
github:
Constructed entirely from standard ConvNet modules, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation
A very clear explanation of an idea that is at the heart of modern mathematics, and geometric deep learning as well: Klein's Erlangen Program and its generalization, here called the isomorphism philosophy.
A short thread on why this matters for AI:
1/
This is how Angela Merkel explained the effect of a higher
#covid19
infection rate on the country's health system.
This part of today's press conf was great, so I just added English subtitels for all non-German speakers.
#flattenthecurve
IMO this is the most insightful way to introduce and understand convolution.
Interestingly, group conv and steerable conv on homogeneous spaces can also be derived from symmetry principles. Convolution is all you need!
Have you ever wondered what is so special about convolution? In a new blog post, I show how to derive
#convolution
from translational symmetry principles:
This is key to extending
#DeepLearning
to
#graphs
Exciting work from our team towards making neural video compression a reality: running a neural video decoder on a mobile phone in real time.
Check out the demo video at
Next week I will be kicking off the virtual Physics ⋂ ML series with a talk about *Natural* Graph Networks, a new and fundamentally more flexible class of graph networks. Without a doubt the most exciting thing since Gauge CNNs 🔥
Project led by
@pimdehaan
with
@wellingmax
Physics ∩ ML is going virtual! If you're interested in the interface of theoretical physics and ML, come hear talks by
@TacoCohen
, Phiala Shanahan, Ard Louis, and
@hashimotostring
. More info at .
Short but sweet paper on recurrent autoencoder architectures for speech compression. We systematically explore the space of RNN-AEs and show that the best method, dubbed FRAE, outperforms classical codecs by a large margin. Check it out!
I am thrilled to announce our paper “Feedback Recurrent AutoEncoder” was accepted at
#ICASSP2020
! collaboration with Yang Yang,
@TacoCohen
and Jon Ryu. . A quick thread.
We're looking for summer interns at Qualcomm AI Research in Amsterdam! Interested in working on causal rep. learning & RL (my team), compression/generative models, combinatorial opt., model efficiency, federated learning, wireless, perception? Apply now!
If we solve all benchmarks with ~current tools + large scale systems engineering, we will have learned that intelligence is a mirage; a bunch of domain-specific tricks.
Imo this'd be profound, on par with "earth is just another planet" & "humans are just another kind of animal"
there is a scary possibility that we may solve all the benchmarks we come up for AI... without understanding anything fundamentally deep about what intelligence is about
a bummer for those like me who are see AI as a fantastic way to unlock deeper insights on human intelligence
Interested in generative modelling and image/video/audio compression? Qualcomm AI Research is hiring researchers in this exciting area in Amsterdam and San Diego!
Harm's Law of Smol Models (HLSM) tells us how much we need to scale up the data size (k_D) as we scale down the model size (k_N), if we wish to preserve the loss of a Chinchilla-optimal model.
Super excited to present our latest work in GDL: The Geometric Algebra Transformer (AKA GATr 🐊)
Combines the scalability of a transformer with general-purpose GA features & full E(3) equivariance. Check out the thread below! ⬇️
Are you dealing with geometric data, be it from molecules or robots? Would you like inductive biases *and* scalability?
Our Geometric Algebra Transformer (GATr 🐊) may be for you.
New work w/
@pimdehaan
, Sönke Behrends, and
@TacoCohen
:
1/9
A lot of people are skeptical that self-training can work. But the story of Ramanujan shows that once a certain threshold of intelligence is crossed, pure self-training in mathematics is possible even without an external reward signal provided by a proof checker.
Dear ML twitter,
Not to fear monger, but the mathematicians are closing in on us. They just reached 1999 and reduced Roweis & Ghahramani's epic paper to a slick 2-pager:
Very exciting result: equivariance changes the exponent of the scaling law!
Equivariant nets really do *learn faster* [provided the problem has the relevant symmetries]
Rotation equivariant Steerable G-CNNs are now state of the art on tumor classification, nuclear segmentation and gland segmentation. Very exciting to see G-CNNs being used more and more in medical imaging, and working so well!
[1/6] We are pleased to announce our paper ‘Dense Steerable Filter CNNs for Exploiting Rotational Symmetry in
Histology Image Analysis’
paper:
code:
@nmrajpoot
@TIAwarwick
The Good Regulator Theorem states that a maximally simple regulator of a system must contain a model of that system.
A regulator is kind of like a policy that controls the system to keep its outputs in some desired range. To be a model means that there exists a homomorphism from…
Do language models have an internal world model? A sense of time? At multiple spatiotemporal scales?
In a new paper with
@tegmark
we provide evidence that they do by finding a literal map of the world inside the activations of Llama-2!
Eliminating All Bad Local Minima from Loss Landscapes Without Even Adding an Extra Unit
It's less than one page. It may be deep. It may be trivial. It will definitely help you understand how some claims in recent theory papers could possibly be true.
"Identifiability proofs", which are conspicuously absent for all modern AI methods that actually work, are considered indispensable in the causal inference & causal representation learning communities. Without a proof, the method is not "truly causal".
More evidence that roto-translation equivariant G-CNNs outperform conventional CNNs by a large margin on medical imaging problems with rotation symmetry. G-CNN on 25%-50% of data outperforms CNN on 100% (+data augmentation). Great paper with lots of details & careful experiments.
Great work by Maxime Lafarge indeed! It shows that group CNNs again consistently outperform regular CNNs and it shows the power of G-convs with a fine rotation resolution (finer than standard 90 degree rotations). Includes a careful analysis of obtained equivariance of the nets.
A beautiful demonstration of the mathematical fact that it is not possible to map a non-trivial orbit of SO(3) [the rotating car] to a Euclidean latent space in a continuous and invertible manner.
More research needed!
GAN's may be evaluated based on how smooth (disentangled) the latent space interpolations are. It is impressive how
#StyleGAN
can interpolate between different orientations - even with no concept of 3D.
If true, this would be a big vindication for equivariant nets.
...Dreaming of a day I will give a talk and nobody asks why we don’t just do data augmentation... 😌
There is still quite a bit of mystery around the details of
@DeepMind
's AlphaFold 2, but equivariance & symmetries may have played a significant role in their success.
This is
@JustasDauparas
's and my take 🧐:
👉👉👉Applications are now open for internships at Qualcomm AI Research! 👈👈👈
Apply now to work with our amazing team on topics ranging from model compression to RL, federated learning, generative models, causality and more.
e2cnn: A comprehensive library for easy construction of rotation-reflection-translation equivariant CNNs in
@PyTorch
+ thorough a experimental study of equivariant network architectures. By
@_gabrielecesa_
and
@maurice_weiler
.
Check out our poster
#143
on general E(2)-Steerable CNNs tomorrow, Thu 10:45AM.
Our work solves for the most general isometry-equivariant convolutional mappings and implements a wide range of related work in a unified framework.
With
@_gabrielecesa_
#NeurIPS2019
#NeurIPS
Hardly anyone believes that LLMs learn or think the way humans do, but if you are instead looking for the essence of intelligence, compression (what LLMs are trained for) is a decent starting point.
Had some more printed, so still have a few copies! DM your address if you want one, I only charge for shipping and even that is free if you can’t afford it.
Nice example of theory informing practice:
Tune hyperparams on a small model using muParameterization, transfer them to a large model without further tuning. Big deal if it works as advertised.
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
By transferring from 40M parameters, µTransfer outperforms the 6.7B GPT-3, with tuning cost only 7% of total pretraining cost.
abs:
repo:
Chainer continues to amaze me. With a tiny team, they built a DL framework that is competitive with or superior to the major (well-funded) DL frameworks in terms of speed, ease of use, and features (e.g. Chainer pioneered dynamic computation graphs).
Released Chainer/CuPy v4.0.0!
#Chainer
: Major performance improvements including TensorCore support and iDeep backend, NCCL2 support, Caffe export.
#CuPy
: CUDA 9.1 support, wheel package, FFT support, etc. More in the blog post and release notes.
There has been some discussion on ML twitter about the meaning of the word compositionality. It is a word that, like "disentangling", has many meanings. But there is a mathematical framework that captures all of them: category theory.
On the topic of compositionality: I was recently tasked with giving a talk on the topic (what do people mean, how do they measure, how to achieve it etc) ->
First steps towards learning representations that respect the topology of the data manifold: "Explorations in Homeomorphic Variational Auto-Encoding" by
@lcfalors
@pimdehaan
@im_td
@nicola_decao
M. Weiler, P. Forre, yours truly.
Check out poster 19 at
#TADGM
#ICML2018
Finding applications of inapplicable math is one of my favorite things. So I would like to take this opportunity to apologize to all the group representation theorists, non-commutative harmonic analysts, differential geometers, and fiber bundlists whose work I have made use of.
In deep learning, it is acceptable to add an inductive bias to your model, but only if you don't understand why it works. Understanding things via mathematics was already tried by the SVM folks and it didn't work.
Yuval Harari (
@harari_yuval
) noted in Sapiens that humans may be unique in their ability to imagine non-existing things like a person with a lion's head. Interestingly, generative models already appear to be quite good at this.
Check out
@Qualcomm
#AI
Research's latest breakthrough: the world’s first software-based neural video decoder running HD format in real-time on a commercial smartphone. Learn more:
🚨 Hiring Alert🚨
The FAIR CodeGen team in Paris is looking for research engineers! Come join this super talented team, help release open models to the world, and push the frontiers of code generation research!
Check out my new work with
@wellingmax
on Deep Scale Spaces (link: ). We develop a new kind of 'semigroup convolution', generalizing the group conv of
@TacoCohen
, and present the connection with classical scale-spaces from CV
To me, the current phase is even more exciting than the last. To make progress, we need to rethink foundations: causality and explanation, learning without rewards, common sense reasoning, etc.. Not easy, but certainly tractable.
As the new decade gets underway, AI appears to be transitioning to a new phase. But what does it look like? I spoke to academics and researchers at companies like Facebook, DeepMind, and Microsoft to try and find out
Any theory that explains how or why neural nets work so well should be consistent with the fact that NNs that don't throw away any information until the very last layer work just fine.
Green AI: "[Deep Learning] computations have a surprisingly large carbon footprint. [...] This position paper advocates a practical solution by making efficiency an evaluation criterion for research along-side accuracy and related measures"
Looking forward to an in-person NeurIPS!
I will be at the Qualcomm booth Tue & Wed from 9-11 and 13-15. Stop by anytime or send me a DM if you want to chat!
Happening today! OmniCV workshop @ CVPR. I’ll be giving a (pre-recorded) talk on Spherical CNNs, Icosahedral CNNs, Gauge CNNs, Mesh CNNs and all that, and doing a live Q&A
I highly recommend this course on Equivariant DL by Erik Bekkers. It does a great job covering the fundamentals as well as recent developments. Check it out!
WIRED: the AlphaStar Transformer-LSTM-AutoRegressive-PointerNet cognitive architecture
TIRED: deep learning is just curve fitting
EXPIRED: arguments about symbolic AI
We present the attentive group convolution, a generalization of the group convolution that uses attention during the group convolution to focus on relevant symmetry combinations. It generates equivariant attention maps as well.
@erikjbekkers
@jmtomczak
I've written a Jax version of the great _escnn_ () python library for training equivariant neural networks by
@_gabrielecesa_
It's over there! Hope you'll find it useful 🙌
I've been saying this for a while now. Having a prior belief about the value of a meaningless parameter makes no sense. Important corollary: number of parameters is not a great measure of model complexity.
Another exciting workshop coming up: "Towards learning with limited labels: Equivariance, Invariance, and Beyond". With talks by Bengio, Poggio, Soatto, Gupta, Pathak & yours truly. Submissions due May 20th! (2 days after the NIPS deadline)
Very excited about this project and the future possibilities for instance-adaptive compression. Great work by joint first authors
@tivaro
&
@IamHuijben
!
In our new paper with
@IamHuijben
and
@TacoCohen
(accepted at
#ICLR2021
), we improve neural I-frame compression with 1 dB by overfitting the full compression model on the data instance that we want to transmit! (1/3)
The bitter-sweet lesson: methods that can efficiently leverage compute & data work best, but you still need to respect the symmetries.
#geometricdeeplearning
#compchem
We’re hiring PhD interns to work on code generation research at FAIR in EMEA! Please apply at if you’re interested by research in Code Llama, LLMs, code generation, compilers, reinforcement learning.
Latest news from Equivariland: "Clebsch-Gordan Networks: a Fully Fourier Space Spherical Convolutional Neural Network", by
@risi_kondor
, Zhen Lin &
@_onionesque
. Easy to implement and numerically stable 3D rotation-equivariant networks.
Our new paper: "Clebsch-Gordan Networks: a Fully Fourier Space Spherical Convolutional Neural Network"
The architecture here avoids forward and backward Fourier transforms needed in prior art by making use of the C-G transform as the non-linearity.
New PhD project on geometric DL for spatiotemporal data in Amsterdam by
@egavves
! (I will serve as industry co-supervisor)
The project is quite open ended, so lots of room for your input. Great opportunity to work in an exciting area with top-notch colleagues in the QUVA lab.
Interested in 'Geometric Deep Learning of Space and Time'?The portal is now online!
Apply *now* for our ELLIS PhD program for a PhD position at the QUVA Lab of the University of Amsterdam, with
@TacoCohen
!
#ECCV2022
#NeurIPS2022
@NandoDF
Another issue is paper length. Many of the tech reports on LLMs and code models are necessarily very long and won’t fit into 8 pages.
Maybe there should be a special venue for such engineering heavy research?
Constrained optimization has several practical advantages over the standard beta-VAE (rate/distortion) loss for training compression models. Check out the paper! 👇
Still training β-VAEs for lossy compression? Why not use constrained optimization?
Have a look at our CLIC CVPR paper: Lossy Compression with Distortion Constrained Optimization
Joint work with
@TacoCohen
and
@gsautiere
Physics ∩ ML is now listed on , with easy calendar sync.
Come hear
@TacoCohen
tomorrow @ 12:00 EDT on "Natural Graph Networks." Info sent via mailing list, register at .