Alex Alemi
@alemi
Followers
1K
Following
90
Media
4
Statuses
65
Machine Learning Researcher
Kissimmee, FL
Joined January 2008
I am recruiting Ph.D. students for my new lab at @nyuniversity! Please apply, if you want to work with me on reasoning, reinforcement learning, understanding generalization and AI for science. Details on my website: https://t.co/d8uId2LC47. Please spread the word!
17
104
748
Recently I've been playing around with a quarter-order-of-magnitude system for simple calculations. It gives better precision than single sig-fig calculations using only four, very intuitive, symbols. https://t.co/BO9mLi8pLF
0
0
8
If you miss the NYTimes needle, especially one that is statistically uniform ( https://t.co/uqLw9f69Sw), you can use this page: https://t.co/xQ5cFrtRSD I whipped together to reason about the correlations between the swing states tonight as results come in.
0
1
18
Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks. Check out https://t.co/DRO2IbTFCg and help @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd, @noahconst and I make Kevin’s dream a reality.
0
6
15
Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper for some progress that we’re hoping others can build on. https://t.co/mceqpUfZQo With @blester125, @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd
arxiv.org
In this paper, we explore the idea of training large language models (LLMs) over highly compressed text. While standard subword tokenizers compress text by a small factor, neural text compressors...
2
10
76
PaLM 540 Billion, Google's large language model used 4.2 moles of flops to train. 4.2 Moles!
0
0
9
Happy to announce DreamFusion, our new method for Text-to-3D! https://t.co/4xI2VHcoQW We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed! Joint work w/ the incredible team of @BenMildenhall @ajayj_ @jon_barron
#dreamfusion
128
1K
6K
@dpkingma @poolio To accompany the colab, I've also written a blog post https://t.co/qvmp8pg1g6 attempting to make sense of the VDM Diffusion loss. In it, I try to motivate how the VDM diffusion loss is simply the joint KL between the forward and reverse process.
2
11
51
Want to understand and/or play with variational diffusion models? - See https://t.co/V1jP11fMmI for a simple stand-alone implementation and explanation. (Thanks @alemi and @poolio for making this)! - See https://t.co/kwlCncttBk for an even more basic implementation on 2D data.
colab.research.google.com
Run, share, and edit Python notebooks
1
63
327
A pretty cool paper (and I also hope useful) on using pre-training models to create highly informative priors for downstream tasks. Thanks to all the collaborators, it was a lot of fun!
Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors. https://t.co/cglYGiLNeM w/@ziv_ravid, @micahgoldblum, @HosseinSouri8, @snymkpr, @Eiri1114, @ylecun 1/6
2
12
79
1/ Super excited to introduce #Minerva 🦉( https://t.co/UI7zV0IXlS). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.
Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. https://t.co/bQJOyMSCD4
29
519
3K
We are thrilled to announce Imagen, a text-to-image model with unprecedented photorealism and deep language understanding. Explore https://t.co/mSplg4FlsM and Imagen! A large rusted ship stuck in a frozen lake. Snowy mountains and beautiful sunset in the background. #imagen
57
297
2K
here are the next few days wordle answers as md5 hashes 2022-01-11 = 0b18a3d7b9c43ff1750d2baa4606b8d0 2022-01-12 = 047fb90408a79f189d51cbcea168b1a5 2022-01-13 = ab3358313efb03210a1babfb372246f1 2022-01-14 = d821e448212defd91ac1e67f9653a34d
3
0
2
We are presenting our paper "Does Knowledge Distillation Really Work?" at #NeurIPS2021 poster session 2 today - come check it out! Joint work with @Pavel_Izmailov, @polkirichenko, @alemi, and @andrewgwils. Poster: https://t.co/N4PlsxnpZE Paper: https://t.co/UNSIizi2GG
2
13
78
Excited to kick-start focus #SciML series on #ML meets Info theory and statistical mechanics! Amazing speaker/session chair line-up: @alemi (@wellingmax), @pratikac (Karthik), @ShoYaida (@jaschasd), @yasamanbb (@SuryaGanguli) and Elena Agliari. Details at:
4
33
196
While most papers on knowledge distillation focus on student accuracy, we investigate the agreement between teacher and student networks. Turns out, it is very challenging to match the teacher (even on train data!), despite the student having enough capacity and lots of data.
Does knowledge distillation really work? While distillation can improve student generalization, we show it is extremely difficult to achieve good agreement between student and teacher. https://t.co/VpK6Xy2q3S With @samscub, @Pavel_Izmailov, @polkirichenko, Alex Alemi. 1/10
3
15
114